Commit Graph

15202 Commits

Author SHA1 Message Date
Nick Hill
4e824d1c83 [Model Runner V2][Minor] Simplify PP logic (#38031)
Signed-off-by: Nick Hill <nickhill123@gmail.com>
2026-03-24 13:57:17 -07:00
amey asgaonkar
0c1809c806 Add Ubuntu 24.04 support for Docker builds (#35386)
Signed-off-by: aasgaonkar <aasgaonkar@nvidia.com>
2026-03-24 13:34:44 -07:00
liangel-02
8c47fdfdb1 [FlexAttention] allow custom mask mod (#37692)
Signed-off-by: Angel Li <liangel@meta.com>
2026-03-24 16:03:24 -04:00
Javier De Jesus
54b0578ada [Bugfix] Pass hf_token through config loading paths for gated model support (#37920)
Signed-off-by: javierdejesusda <javier.dejesusj9@gmail.com>
2026-03-24 15:22:05 -04:00
Richard Zou
89f572dbc0 [BugFix] fix VLLM_USE_STANDALONE_COMPILE=0 (#38015)
Signed-off-by: Richard Zou <zou3519@gmail.com>
2026-03-24 19:08:26 +00:00
Richard Zou
71a4a2fbd0 [BugFix] Fix order of compile logging (#38012)
Signed-off-by: Richard Zou <zou3519@gmail.com>
2026-03-24 18:58:18 +00:00
Nick Cao
935c46dd9b [Model] Add Granite 4.0 1B speech to supported models (#38019)
Signed-off-by: Nick Cao <ncao@redhat.com>
2026-03-24 18:23:41 +00:00
Willy Hardy
057fc94cbd [Bugfix] Fix structured output crash on CPU due to pin_memory=True (#37706)
Signed-off-by: Willy Hardy <whardy@redhat.com>
Signed-off-by: Will Hardy <whardy@redhat.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-24 17:44:17 +00:00
Vineeta Tiwari
b58c5f28aa docs: fix broken offline inference paths in documentation (#37998)
Signed-off-by: Vineeta Tiwari <vineeta.tiwari2@ibm.com>
Signed-off-by: Vineeta Tiwari <vineetatiwari2000@gmail.com>
Co-authored-by: Vineeta Tiwari <vineeta.tiwari2@ibm.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-03-24 17:35:14 +00:00
Ming Yang
c07e2ca6e0 Fix Mamba state corruption from referencing stale block table entries (#37728) (#37728) (#37728) 2026-03-24 10:29:59 -07:00
Dhruv Singal
4df5fa7439 [Bugfix] Force continuous usage stats when CLI override is enabled (#37923)
Signed-off-by: Your Name <you@example.com>
Co-authored-by: Your Name <you@example.com>
Co-authored-by: OpenCode <noreply@openai.com>
2026-03-24 10:29:50 -07:00
sihao_li
a5416bc52e [XPU] Support Intel XPU hardware information collection in usage stats (#37964)
Signed-off-by: sihao.li <sihao.li@intel.com>
2026-03-24 10:29:17 -07:00
Harry Mellor
b3601da6e7 [Mypy] Fix mypy for vllm/model_executor (except vllm/model_executor/layers) (#37904)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2026-03-24 17:14:01 +00:00
Dan Blanaru
dc78c2c933 [Core] add option to schedule requests based on full ISL (#37307)
Signed-off-by: Dan Blanaru <48605845+DanBlanaru@users.noreply.github.com>
Co-authored-by: Claude <noreply@anthropic.com>
2026-03-24 13:01:12 -04:00
Sungjae Lee
4731884796 [Feature] limit thinking tokens (hard limit) (#20859)
Signed-off-by: Sungjae Lee <33976427+llsj14@users.noreply.github.com>
Signed-off-by: Sungjae Lee <sung-jae.lee@navercorp.com>
Signed-off-by: Chauncey <chaunceyjiang@gmail.com>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2026-03-24 09:53:07 -07:00
Harry Mellor
8de5261e69 Update new contributor message (#37999)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2026-03-24 16:01:41 +00:00
wang.yuqi
1b6cb920e6 [Deprecate] Deprecate pooling multi task support. (#37956)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Signed-off-by: wang.yuqi <noooop@126.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2026-03-24 14:07:47 +00:00
Li, Jiang
352b90c4a4 [Bugfix] Add replacement of _compute_slot_mapping_kernel on CPU (#37987)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
2026-03-24 07:00:20 -07:00
Sage
1c0aabdeb0 [Bugfix] Suppress spurious CPU KV cache warning in launch render (#37911)
Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>
2026-03-24 12:36:18 +00:00
Ilya Markov
14acf429ac [EPLB] Remove main waits in case of slow EPLB (#36271)
Signed-off-by: ilmarkov <markovilya197@gmail.com>
2026-03-24 11:50:44 +00:00
Harry Mellor
ce57fd5557 [Docs] Fix build (#37991)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2026-03-24 03:20:49 -07:00
Flora Feng
2e67fa756d Fix tool_parser_cls type annotation from Callable to type[ToolParser] (#37957)
Signed-off-by: sfeng33 <4florafeng@gmail.com>
2026-03-23 22:58:27 -07:00
Ronen Schaffer
e3c6c10cad [KV Offload] Refactor CPU offloading: pluggable CachePolicy, remove Backend abstraction, restructure into cpu/ package (#37874)
Signed-off-by: Ronen Schaffer <ronen.schaffer@ibm.com>
2026-03-24 07:02:51 +02:00
jetxa
16a664df24 [Frontend][Bugfix] Pass default_chat_template_kwargs to AnthropicServingMessages (#37899)
Signed-off-by: jetxa <jetxzhang@outlook.com>
2026-03-24 05:00:12 +00:00
Kevin H. Luu
7281199a8c [release] Move agent queue to Release cluster queues (#37783)
Signed-off-by: khluu <khluu000@gmail.com>
2026-03-23 20:36:47 -07:00
Kevin H. Luu
b2dd75eb48 Downsize CPU jobs to use small queue (#37913)
Signed-off-by: khluu <khluu000@gmail.com>
Signed-off-by: Kevin H. Luu <khluu000@gmail.com>
2026-03-23 20:36:37 -07:00
Wentao Ye
c59a132f96 [V0 Deprecation] Refactor kv cache from list to element (#37487)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2026-03-23 20:10:11 -07:00
Andreas Karatzas
de99d91ece [ROCm][CI] Split Entrypoints Integration (API Server 1) into 3 jobs (#37906)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-03-24 09:48:37 +08:00
Wentao Ye
83c9d525b6 [CI] Add batch invariant test: Block FP8 + small MOE (#37895)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2026-03-23 21:16:14 -04:00
Giancarlo Delfin
8f4824b664 [Model Runner V2] Gather multimodal embeddings before draft model postprocess (#37932)
Signed-off-by: Giancarlo Delfin <gdelfin@inferact.ai>
2026-03-23 18:14:13 -07:00
roikoren755
56777b5c89 [Test] E2E Nemotron-3-Super tests (#36803)
Signed-off-by: Roi Koren <roik@nvidia.com>
2026-03-23 17:49:56 -07:00
Kevin H. Luu
2488a82f89 [CI] Split V1 Others into 3 separate jobs (#37016)
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-24 06:44:38 +08:00
Ranran
dc6908ac6a [Bugfix] Register VLLM_BATCH_INVARIANT in envs.py to fix spurious unknown env var warning (#35007)
Signed-off-by: Ranran <1012869439@qq.com>
Signed-off-by: Ranran <hzz5361@psu.edu>
Signed-off-by: ran <hzz5361@psu.edu>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
2026-03-23 18:31:14 -04:00
yzong-rh
e85f8f0932 [Bug][MoE] Strengthen _supports_current_device() checks in the TRTLLM FP8, NVFP4, and FlashInfer CuteDSL MoE experts (#36728)
Signed-off-by: Yifan Zong <yzong@redhat.com>
2026-03-23 17:02:57 -04:00
Robert Shaw
5bf3c42d4c [Bug][MoE] Fix TRTLLM NVFP4 Routing Kernel Precision (#36725)
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
2026-03-23 20:19:06 +00:00
Kyle Sayers
38364a7e32 [Sparse24] [Deprecation] Remove Sparse24 CT integration and kernels (#36799)
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
2026-03-23 16:03:29 -04:00
Matthew Bonanni
fafe76b4af [Async][Spec Decoding] Zero-bubble async scheduling + spec decoding (#32951)
Signed-off-by: zhuhaoran <zhuhaoran.zhr@alibaba-inc.com>
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Co-authored-by: zhuhaoran <zhuhaoran.zhr@alibaba-inc.com>
Co-authored-by: zhrrr <43847754+izhuhaoran@users.noreply.github.com>
Co-authored-by: Lucas Wilkinson <lwilkins@redhat.com>
Co-authored-by: Benjamin Chislett <chislett.ben@gmail.com>
2026-03-23 15:37:22 -04:00
Woosuk Kwon
ffb5b32b5f [MRV2] Consider spec decoding in warmup (#37812)
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Co-authored-by: Nick Hill <nickhill123@gmail.com>
2026-03-23 17:45:43 +00:00
Kunshang Ji
91fd695b75 [CI] split Entrypoints Integration (API Server 1) into 3 jobs (#37882)
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
2026-03-23 10:37:56 -07:00
Nicolò Lucchesi
1cbbcfe8a3 [CI][PD] Add Hybrid SSM integration tests to CI (#37657)
Signed-off-by: NickLucche <nlucches@redhat.com>
2026-03-23 23:58:19 +08:00
Angela Yi
aceadb5ee1 Use lazy graph module during split_module to defer recompile() (#37609)
Signed-off-by: angelayi <yiangela7@gmail.com>
2026-03-23 11:21:29 -04:00
Yufeng He
ec2280611a [Bugfix] Fix RoBERTa position_ids accumulation on CUDA graph padding (#37884) 2026-03-23 15:15:12 +00:00
yanghui1-arch
7151ae6528 [Bugfix] RoBERTa position_id accumulation in CUDA graph padding region (#37873)
Signed-off-by: dass90 <3053034939@qq.com>
2026-03-23 14:59:21 +00:00
Wentao Ye
45bd5c8e75 [Mypy] Fix mypy for vllm/config (#37808)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2026-03-23 14:33:59 +00:00
Zhaodong Bing
10a1018c12 [ROCm] fix sleep mode not releasing GPU memory problem on ROCm (#37533)
Signed-off-by: bingzhaodong <aaab8b@gmail.com>
Co-authored-by: TJian <tunjian.tan@embeddedllm.com>
2026-03-23 06:07:19 -07:00
Jee Jee Li
aec2dc6c0d [Bugfix][LoRA] Fix incorrect LoRA Log (#37877)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2026-03-23 11:42:52 +00:00
DorBernsohn
7938d12119 [Bugfix] Fix CPU backend crash in KV cache block zeroing (#37550)
Signed-off-by: DorBernsohn <dor.bernsohn@gmail.com>
2026-03-23 11:35:45 +00:00
Kunshang Ji
debd6e768c [XPU][MoE Refactor] Refactor xpu mxfp4 support into oracle (#37784)
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
2026-03-23 11:10:41 +00:00
Andrew Xia
9ace378a63 [Frontend][Responses API] Fix arrival_time recording for TTFT on initial request (#37498)
Signed-off-by: Andrew Xia <axia@meta.com>
2026-03-23 09:58:08 +00:00
Kunshang Ji
27d5ee3e6f [FP8]add FP8 WoQ kernel abstraction. (#32929)
Signed-off-by: Zhu, Zufang <zufang.zhu@intel.com>
2026-03-23 09:47:47 +00:00