biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Olya Kozlova	598190aac3	[fix] Remove trtllm ragged mla prefills (#36540 ) Signed-off-by: Olya Kozlova <okozlova@nvidia.com>	2026-03-31 12:30:27 -07:00
Xu Jinyang	b779eb3363	[Model] Sync upstream BT=chunk_size fix for GDN chunk_fwd_kernel_o, simplify warmup to single pass (#38343 ) Signed-off-by: AuYang <459461160@qq.com> Co-authored-by: Jiangyun Zhu <riverclouds.zhu@qq.com>	2026-03-31 23:03:24 +04:00
BadrBasowid	077a9a8e37	[torch.compile] Refactor Attention Quant Fusion Pass and Remove Boilerplate (#37373 ) Signed-off-by: BadrBasowid <badr.basowid@gmail.com> Co-authored-by: vllmellm <vllm.ellm@embeddedllm.com>	2026-03-31 14:15:50 -04:00
Run Yu	07edd551cc	[CI/Build] Resolve a dependency deadlock when installing the test dependencies used in CI (#37766 ) Signed-off-by: Run Yu <yurun00@gmail.com>	2026-03-31 18:05:14 +00:00
mikaylagawarecki	7c080dd3c5	[4/n] Migrate FP4/W4A8 CUTLASS kernels to torch stable ABI (#37503 ) Signed-off-by: Mikayla Gawarecki <mikaylagawarecki@gmail.com>	2026-03-31 10:21:13 -07:00
Yi Liu	0dd25a44ea	[Quantization][Autoround][XPU] Add `W4A16` Support (#37986 ) Signed-off-by: yiliu30 <yi4.liu@intel.com>	2026-03-31 16:48:24 +00:00
SandishKumarHN	3896e021a0	[Bugfix] Fix FusedMoE weight loading with padded hidden dimensions (#37010 ) Signed-off-by: SandishKumarHN <sandish@fb.com>	2026-03-31 12:22:26 -04:00
zhang-prog	b6e636c12c	[Fix] handle PaddleOCR-VL image processor max_pixels across Transformers v4/v5 (#38629 ) Signed-off-by: zhangyue66 <zhangyue66@baidu.com> v0.18.2rc0	2026-03-31 15:50:41 +00:00
Jingu Kang	f1ff50c86c	[Bugfix] clamp dA_cumsum differences to prevent Inf in Mamba2 SSD kernels (#37501 ) Signed-off-by: Jingu Kang <jg.k@navercorp.com>	2026-03-31 17:35:51 +02:00
Matthew Bonanni	757068dc65	[Bugfix][Async] Fix async spec decoding with hybrid models (#38556 ) Signed-off-by: SandishKumarHN <sandishkumarhn@gmail.com> Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Co-authored-by: SandishKumarHN <sandishkumarhn@gmail.com>	2026-03-31 11:08:54 -04:00
Nicolò Lucchesi	7337ff7f03	[Docs] PD with Nixl compat matrix (#38628 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2026-03-31 15:01:21 +00:00
Kyle Sayers	5869f69c5f	[Online Quant] [QeRL] Minor code cleanup (#38574 ) Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>	2026-03-31 14:56:43 +00:00
wliao2	4dfad17ed1	replace cuda_device_count_stateless() to current_platform.device_count() (#37841 ) Signed-off-by: Liao, Wei <wei.liao@intel.com> Signed-off-by: wliao2 <wei.liao@intel.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>	2026-03-31 22:32:54 +08:00
wenjun liu	e8057c00bc	[CI] Avoid concurrent docker pull in intel XPU CI runners to prevent rate limit issues (#38594 ) Signed-off-by: wendyliu235 <wenjun.liu@intel.com>	2026-03-31 22:23:18 +08:00
Nicolò Lucchesi	7430389669	[Bugfix][CI] Skip flaky `test_eagle` test (#38566 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2026-03-31 09:42:37 -04:00
ElizaWszola	202f147cf2	Fix MLA runs when use_inductor_graph_partition=True (#38631 ) Signed-off-by: ElizaWszola <ewszola@redhat.com>	2026-03-31 13:37:43 +00:00
Jiangyun Zhu	ea7bfde6e4	[CI] fix LM Eval Qwen3.5 Models (B200) (#38632 ) Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>	2026-03-31 13:20:08 +00:00
sihao_li	d71a15041f	[XPU]move testing dependencies from Dockerfile to xpu-test.in (#38596 ) Signed-off-by: sihao.li <sihao.li@intel.com> Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>	2026-03-31 12:49:43 +00:00
Ilya Markov	abdbb68386	[EPLB] Add alternative communication for EPLB weight exchange (#33176 ) Signed-off-by: ilmarkov <markovilya197@gmail.com> Signed-off-by: Markov Ilya <markovilya19@gmail.com> Co-authored-by: Markov Ilya <markovilya19@gmail.com>	2026-03-31 08:17:12 -04:00
liuzhenwei	0c63739135	[EPD] update EPD script arguments (#36742 ) Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com>	2026-03-31 12:02:09 +00:00
wang.yuqi	719735d6c5	[CI Failure] pin colmodernvbert revision (#38612 ) Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io> Signed-off-by: wang.yuqi <noooop@126.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2026-03-31 10:54:54 +00:00
Maosheng Liao	aae3e688f8	Fix document of torchrun_example.py (#31113 )	2026-03-31 10:54:23 +00:00
Matthew Bonanni	7d65463528	[WIP][CI][Bugfix] Fix `test_run_eagle_dp` (#38584 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2026-03-31 12:30:25 +02:00
Mateusz Sokół	8278825b57	DOC: TPU mention fix (#38129 ) Signed-off-by: Mateusz Sokół <mat646@gmail.com>	2026-03-31 03:27:56 -07:00
Chang Su	acf7292bf2	[Misc] Move --grpc CLI argument into make_arg_parser (#38570 ) Signed-off-by: Chang Su <chang.s.su@oracle.com>	2026-03-31 03:24:05 -07:00
Chauncey	ce884756f0	[Feature]: add presence_penalty and frequency_penalty fields to Responses API (#38613 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>	2026-03-31 08:45:57 +00:00
wang.yuqi	d9d21eb8e3	[Frontend][3/n] Improve pooling entrypoints \| scoring. (#28631 ) Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>	2026-03-31 07:52:00 +00:00
Yintong Lu	f09daea261	[CPU] Support int8 compute mode in CPU AWQ (#35697 ) Signed-off-by: Yintong Lu <yintong.lu@intel.com>	2026-03-31 15:27:37 +08:00
Kevin H. Luu	42318c840b	[ci] Remove benchmarks job (#38611 )	2026-03-31 06:46:21 +00:00
zhangyiming	1ac6694297	[OOT] Add OOT support for linear kernel. (#37989 ) Signed-off-by: menogrey <1299267905@qq.com>	2026-03-31 14:33:21 +08:00
Kfir Toledo	6cc7abdc66	[kv_offload+HMA] Fix num_blocks with different per-layer page sizes and improve assert message (#38554 ) Signed-off-by: Kfir Toledo <kfir.toledo@ibm.com> Co-authored-by: Or Ozeri <oro@il.ibm.com>	2026-03-31 06:00:40 +00:00
Flora Feng	d53cb9cb8e	[Tool Parser][2/3] Use self.tools instead of request.tools in tool parsers (#38189 ) Signed-off-by: sfeng33 <4florafeng@gmail.com>	2026-03-31 13:41:36 +08:00
Louie Tsai	44eef0ca1e	vLLM Benchmark Suite perf regression after PR#32723 (#38576 ) Signed-off-by: louie-tsai <louie.tsai@intel.com>	2026-03-31 05:23:17 +00:00
Andreas Karatzas	b9cdc85207	[ROCm][CI] Fix Whisper translation test attention backend selection (#38508 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-03-31 13:21:49 +08:00
Flora Feng	3e802e8786	[Mypy] Fix adjust_request typing (#38264 ) Signed-off-by: sfeng33 <4florafeng@gmail.com>	2026-03-31 04:21:18 +00:00
Martin Hickey	350af48e14	[KVConnector] Remove redundant method KVConnectorOutput::merge() (#38546 ) Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com>	2026-03-31 07:11:02 +03:00
Lucas Kabela	e31915063d	[Bugfix] Fix for builtins (forward fix of pytorch/177558) (#37234 ) Signed-off-by: Lucas Kabela <lucaskabela@meta.com>	2026-03-31 01:08:11 +00:00
Flora Feng	29e48707e8	[Refactor] Consolidate Tool type alias in tool_parsers/utils.py (#38265 ) Signed-off-by: sfeng33 <4florafeng@gmail.com>	2026-03-31 00:55:51 +00:00
sungsoo ha	4ac227222f	[Bugfix][DCP] Fix CUDA graph capture for Decode Context Parallelism (#36070 ) Signed-off-by: Sungsoo Ha <sungsooh@nvidia.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-30 20:20:43 -04:00
Vadim Gimpelson	bb51d5b40d	Add @vadiklyutiy as committer (#38589 ) Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>	2026-03-31 07:50:04 +08:00
Prathmesh Bhatt	93b3ec1585	feat(attention): extract KV-cache update from FlashAttentionDiffKV ba… (#36466 ) Signed-off-by: Prathmesh Bhatt <71340361+Prathmesh234@users.noreply.github.com>	2026-03-30 23:16:09 +00:00
Netanel Haber	e812bf70bd	Restore non-hf processor path for Nano-Nemotron-VL (bypass `call_hf_processor_mm_only`) - fixes #38018 (#38567 ) Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> Co-authored-by: tomeras91 <57313761+tomeras91@users.noreply.github.com>	2026-03-30 21:56:52 +00:00
SandishKumarHN	bcc6f67447	[Bugfix] Use null block (0) for padded block table entries (#35431 ) Signed-off-by: SandishKumarHN <sandish@fb.com> Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com> Co-authored-by: Matthew Bonanni <mbonanni@redhat.com>	2026-03-30 14:02:51 -07:00
Asaf Gardin	1fc69f59bb	[Bug fix][Quantization] Fix dummy weight loading (#38478 ) Signed-off-by: Josephasafg <ajgard7@gmail.com>	2026-03-30 16:38:02 -04:00
Micah Williamson	d9c7db18da	[ROCm][CI] Pin test_hybrid test to TRITON_ATTN on ROCm (#38381 ) Signed-off-by: Micah Williamson <micah.williamson@amd.com>	2026-03-30 20:26:46 +00:00
Ilya Markov	12701e8af2	[EPLB] Optmize eplb mapping and record in router for prefill (#36261 ) Signed-off-by: ilmarkov <markovilya197@gmail.com>	2026-03-30 19:48:33 +00:00
Benjamin Chislett	494636b29d	[Feat][Spec Decode] DFlash (#36847 ) Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>	2026-03-30 15:03:15 -04:00
mikaylagawarecki	ab1a6a43fa	[3/n] Migrate cutlass/scaled_mm_entry.cu torch stable ABI (#37221 ) Signed-off-by: Mikayla Gawarecki <mikaylagawarecki@gmail.com>	2026-03-30 11:20:13 -07:00
fangyuchu	b5e608258e	[Refactor] Unify engine process monitoring in engine manager and add Ray backend support (#35862 ) Signed-off-by: fangyuchu <fangyuchu@qq.com> Signed-off-by: Nick Hill <nickhill123@gmail.com> Co-authored-by: Nick Hill <nickhill123@gmail.com>	2026-03-30 10:16:09 -07:00
Matthew Bonanni	2c734ed0e0	[Bugfix][MLA] Change default SM100 MLA prefill backend back to TRT-LLM (#38562 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2026-03-30 09:51:24 -07:00

1 2 3 4 5 ...

15412 Commits