biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
leo-cf-tian	2754231ba3	[Kernel] Add FlashInfer MoE A2A Kernel (#36022 ) Signed-off-by: wzhao18 <wzhao18.sz@gmail.com> Signed-off-by: Leo Tian <lctian@nvidia.com> Co-authored-by: wzhao18 <wzhao18.sz@gmail.com> Co-authored-by: Stefano Castagnetta <scastagnetta@nvidia.com> Co-authored-by: root <root@lyris0267.lyris.clusters.nvidia.com>	2026-03-15 23:45:32 -07:00
Itay Alroy	d5af196c18	[2/N] Elastic EP Milestone 2: Integrating NIXL-EP (#35627 ) Signed-off-by: Itay Alroy <ialroy@nvidia.com> Co-authored-by: Yongji Wu <wuyongji317@gmail.com> Co-authored-by: Ron Tourgeman <rtourgeman@nvidia.com>	2026-03-13 09:25:33 -04:00
Kunshang Ji	53ec16a705	[Hardware] Replace torch.cuda.device_count/current_device/set_device API (#36145 ) Signed-off-by: Kunshang Ji <jikunshang95@gmail.com> Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>	2026-03-12 07:57:47 -07:00
Nick Hill	36735fd772	[BugFix] Fix multiple/duplicate stdout prefixes (#36822 ) Signed-off-by: Nick Hill <nickhill123@gmail.com>	2026-03-12 12:23:21 +08:00
Vadim Gimpelson	4ff8c3c8f9	[BUGFIX][Mamba][Qwen3.5] Zero freed SSM cache blocks on GPU (#35219 ) Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>	2026-03-10 03:32:20 -07:00
wang.yuqi	dcf8862fd4	[Examples][1/n] Resettle basic examples. (#35579 ) Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io> Signed-off-by: wang.yuqi <noooop@126.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-08 20:22:53 -07:00
Walter Beller-Morales	43e77e59ab	[BugFix] avoid infinite loop with VLLM_PORT and get_open_ports_list (#36191 ) Signed-off-by: walterbm <walter.beller.morales@gmail.com>	2026-03-05 22:15:29 -08:00
Kunshang Ji	16d2ad1d38	[Hardware] Replace `torch.cuda.empty_cache` with `torch.accelerator.empty_cache` (#30681 ) Signed-off-by: Kunshang Ji <kunshang.ji@intel.com> Signed-off-by: Kunshang Ji <jikunshang95@gmail.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-04 09:49:47 +00:00
Robert Shaw	97995f6376	[MoE Refactor] Create MK for TRTLLM Kernels (#32564 ) Signed-off-by: Robert Shaw <robshaw@redhat.com> Signed-off-by: Robert Shaw <rshaw@neuralmagic.com> Signed-off-by: Robert Shaw <robertgshaw2@gmail.com> Co-authored-by: Robert Shaw <robshaw@redhat.com> Co-authored-by: Robert Shaw <rshaw@neuralmagic.com>	2026-03-03 10:39:50 -08:00
Richard Zou	e82fbeec7b	[torch.compile] Undo the fast_moe_cold_start hack in torch>=2.11 (#35475 ) Signed-off-by: Richard Zou <zou3519@gmail.com>	2026-03-01 21:44:22 +00:00
Chauncey	7e08c22b8c	[Feat] Add CUDA torch fallbacks for fp8_mqa_logits/fp8_paged_mqa_logits_torch function (#35271 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>	2026-02-28 10:12:00 +00:00
Andreas Karatzas	94029ffaf0	[ROCm] Derive device capability from GCN arch string without CUDA init (#35069 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-02-28 13:55:28 +08:00
Michael Goin	4fec53cfcb	[CI] Actually run tests/kernels/quantization/test_block_fp8.py in CI (#34274 )	2026-02-26 17:58:03 -07:00
Tyler Michael Smith	eb19955c37	[WideEP] Remove pplx all2all backend (#33724 ) Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-26 14:30:10 -08:00
Xinyu Chen	35d44b4557	[XPU]Support CUDAGraph on XPU Platform (#34482 ) Signed-off-by: Xinyu Chen <xinyu1.chen@intel.com> Co-authored-by: chzhang <chaojun.zhang@intel.com> Co-authored-by: zhenwei-intel <zhenwei.liu@intel.com> Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>	2026-02-24 22:22:52 -08:00
Kunshang Ji	8ad54a991b	[Platform] Add current_platform.num_compute_units interface (#35042 ) Signed-off-by: Kunshang Ji <kunshang.ji@intel.com> Signed-off-by: Kunshang Ji <jikunshang95@gmail.com>	2026-02-24 22:22:49 -08:00
danisereb	9609b1f18d	Integrate flashinfer mm_mxfp8 in ModelOpt MXFP8 (#35053 ) Signed-off-by: Daniel Serebrenik <daserebrenik@nvidia.com>	2026-02-24 08:45:13 -07:00
Harry Mellor	28c5e69ba0	Enforce that `model` is the first positional arg when `--served-model-name` is used (#34973 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-02-23 08:38:05 -08:00
Neil Schemenauer	54e2f83d0a	[Feature] Lazy import for the "mistral" tokenizer module. (#34651 ) Signed-off-by: Neil Schemenauer <nas@arctrix.com>	2026-02-23 00:43:01 -08:00
Manrique Vargas	ad5aa6bd9f	fix(docs): fix typos in comments and docstrings (#34836 ) Signed-off-by: machov <mv1742@nyu.edu>	2026-02-18 23:17:41 -08:00
ElizaWszola	a88b3be7c4	[Bugfix] Fix quant RMS norm fusion for quantization with TMA-aligned scales (#33255 ) Signed-off-by: ElizaWszola <ewszola@redhat.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2026-02-17 23:35:04 -08:00
Jongseok Park	c656ba3b4d	[Kernel] Triton-based Top-k and Top-p sampler kernels (#33538 ) Signed-off-by: js_park <cakeng@naver.com> Signed-off-by: Jongseok Park <37990712+cakeng@users.noreply.github.com> Signed-off-by: Sunga Kim <sunga.kim@berkeley.edu> Signed-off-by: Nick Hill <nickhill123@gmail.com> Co-authored-by: Sunga Kim <sunga.kim@berkeley.edu> Co-authored-by: Nick Hill <nickhill123@gmail.com>	2026-02-17 23:14:30 +00:00
Cyrus Leung	574fe75245	[Renderer] Move InputPreprocessor into Renderer (2/2) (#34560 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-02-17 05:29:01 -08:00
Andreas Karatzas	a0638d052d	[Bugfix] Fix ROCm UVA CPU weight offloading broken by #32993 (#34543 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-02-13 20:01:42 -08:00
Wei Zhao	59d53066d8	[Feature] Support CPU Offloading without Pytorch Pinned Memory that leads to doubled allocation (#32993 ) Signed-off-by: wzhao18 <wzhao18.sz@gmail.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>	2026-02-13 08:11:26 -08:00
Cyrus Leung	c9a1923bb4	[Plugin] Simplify IO Processor Plugin interface (#34236 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-02-10 19:47:39 -08:00
wang.yuqi	22b64948f6	[Frontend][last/5] Make pooling entrypoints request schema consensus. (#31127 ) Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>	2026-02-09 06:42:38 +00:00
Wentao Ye	67a746e87f	[Log] Optimize duplicate startup log (#33944 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2026-02-06 17:49:56 +00:00
Xinyu Chen	e969a169ef	support view_from_cpu_tensor on XPU (#33868 ) Signed-off-by: Xinyu Chen <xinyu1.chen@intel.com>	2026-02-06 08:34:20 +00:00
Xin Yang	79028d4388	[Perf] Disable clean_logits in deepgemm fp8_mqa_logits kernel (#33568 )	2026-02-05 20:34:00 -05:00
Tsukasa OI	92e7562a99	[Bugfix] Suppress non-TTY color output on the process name part of the log (#29714 ) Signed-off-by: Tsukasa OI <floss_llm@irq.a4lg.com>	2026-02-05 18:47:09 +00:00
jiahanc	59a5cb387a	[perf] Integrate flashinfer concat_mla_k (#31171 )	2026-02-05 05:23:11 -05:00
Cyrus Leung	83449a5ff0	[Refactor] Clean up pooling serial utils (#33665 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-02-03 10:29:18 +00:00
Cyrus Leung	21997f45b1	[Redo] #33110 with threading limit (#33502 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Co-authored-by: YunzhuLu <lucia.yunzhu@gmail.com>	2026-02-01 09:18:11 +00:00
Roy Wang	63c0889416	[Misc] Fix flashinfer related tests (#33462 ) Signed-off-by: esmeetu <jasonailu87@gmail.com>	2026-01-31 16:10:24 -05:00
Cyrus Leung	f0a1c8453a	[Frontend] Use new Renderer for Completions and Tokenize API (#32863 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-01-31 04:51:15 -08:00
Pavani Majety	c3a9752b0c	[Hardware][SM100] Add TRTLLM Kernel for INT4 W4A16 Kernel. (#32437 ) Signed-off-by: Pavani Majety <pmajety@nvidia.com>	2026-01-30 10:30:46 -08:00
Linda	0493d897c4	[NVIDIA] [feat] Integrate flashinfer Trtllmgen bf16 moe (#32954 ) Signed-off-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com>	2026-01-29 10:00:13 -08:00
Didier Durand	31b25f6516	[Doc]: fixing multiple typos in diverse files (#33256 ) Signed-off-by: Didier Durand <durand.didier@gmail.com> Signed-off-by: Didier Durand <2927957+didier-durand@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2026-01-29 16:52:03 +08:00
Michael Goin	ca1969186d	[UX] Enable nested configs in config yaml files (#33193 )	2026-01-28 16:54:25 -05:00
Angela Yi	4197168ea5	[ez] Remove checks for torch version <= 2.8 (#33209 ) Signed-off-by: angelayi <yiangela7@gmail.com>	2026-01-28 16:03:56 -05:00
Cyrus Leung	11b556878b	[Refactor] Use data parser for matching data items to multi-modal UUIDs (#32955 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-01-26 15:00:28 +08:00
7. Sun	0ccecf8833	[Tests] Standardize RNG seed utility across test files (#32982 ) Signed-off-by: 7. Sun <jhao.sun@gmail.com>	2026-01-24 06:47:14 +00:00
ElizaWszola	a28b94e6ef	[Performance] Split FlashAttn attention and cache update (#25954 ) Signed-off-by: ElizaWszola <ewszola@redhat.com> Signed-off-by: mgoin <mgoin64@gmail.com> Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Signed-off-by: Luka Govedič <luka.govedic@gmail.com> Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Signed-off-by: Luka Govedič <lgovedic@redhat.com> Co-authored-by: mgoin <mgoin64@gmail.com> Co-authored-by: Varun Sundar Rabindranath <varunsundar08@gmail.com> Co-authored-by: Matthew Bonanni <mbonanni@redhat.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by: Luka Govedič <luka.govedic@gmail.com> Co-authored-by: Lucas Wilkinson <lwilkins@redhat.com> Co-authored-by: Luka Govedič <lgovedic@redhat.com>	2026-01-23 17:28:06 -08:00
Xin Yang	d08b356ee0	[Perf] Create TMA-aligned input scale tensor for DeepGemm on Hopper (#32619 ) Signed-off-by: Xin Yang <xyangx@amazon.com>	2026-01-22 15:47:04 -05:00
Isotr0py	8ebf271bb6	[Misc] Replace urllib's `urlparse` with urllib3's `parse_url` (#32746 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2026-01-22 16:37:15 +08:00
Alex Sun	49a1262267	[AMD][ROCm] MoRI EP: a high-performance all2all backend (#28664 ) Signed-off-by: Alex Sun <alex.s@amd.com>	2026-01-22 16:33:18 +08:00
Cyrus Leung	2b8a38b6d6	[Model] Extend `collect_children` and `no_init_weights` contexts (#32757 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-01-22 08:20:27 +00:00
Yanwen Lin	6bb2bc71e2	[Bugfix] Force using spawn multiprocess method when it's the WSL platform (#32749 ) Signed-off-by: Yanwen Lin <lyw1124278064@gmail.com>	2026-01-21 09:35:55 +00:00
Chauncey	c4e5bdf61b	[Bugfix] Fix the fp8_mqa_logits dim mismatch (#32652 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>	2026-01-20 18:48:07 +08:00

1 2 3 4 5 ...

288 Commits