Commit Graph

14717 Commits

Author SHA1 Message Date
Harry Mellor
d5816c8c2f Fix tied weights in weight mapping test for Transformers v5 (#36788)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2026-03-11 15:10:26 +00:00
Woosuk Kwon
8ccbcda5c0 [Model Runner V2] Remove unused warmup_for_prefill method (#36762)
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>
2026-03-11 08:02:44 -07:00
tvirolai-amd
a9e532afe2 [ROCm][Perf] Allow MTP lens > 1 in Sparse MLA (#36681)
Signed-off-by: Teemu Virolainen <teemu.virolainen@amd.com>
2026-03-11 14:43:03 +00:00
Harry Mellor
f3163bba67 Disable docs build skipping until a better solution is found (#36790)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2026-03-11 13:53:23 +00:00
Martin Hickey
700a1ddc65 [Misc] Use envs module to get VLLM_DISABLED_KERNELS (#35776)
Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com>
2026-03-11 13:37:46 +00:00
Silvia Colabrese
f33251ffc8 [Bugfix] Fix Mistral-small --format (#36782)
Signed-off-by: 12010486 <silvia.colabrese@intel.com>
2026-03-11 04:47:52 -07:00
Wuxun Zhang
e584dce52b Add XPU MLA Sparse backend for DeepSeek v3.2 (#33230)
Signed-off-by: Zhang, Wuxun <wuxun.zhang@intel.com>
2026-03-11 19:19:15 +08:00
Ning Xie
40c0461f24 [openapi] refactor render related openapi [3/N] (#36749)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
2026-03-11 03:14:34 -07:00
Weiguang Li
724759684c [Bugfix] Fix Qwen3-VL timestamp mismatch when using num_frames without fps (#36136)
Signed-off-by: OiPunk <codingpunk@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-11 03:13:06 -07:00
Michael Goin
9c34e9d24f Disable cascade attention by default (#36318) 2026-03-11 03:12:23 -07:00
Richard Zou
09b6f99852 [compile] aot_compile should respect VLLM_DISABLE_COMPILE_CACHE (#36358)
Signed-off-by: Richard Zou <zou3519@gmail.com>
2026-03-11 03:12:03 -07:00
Ethan T.
c87fb515ed fix(lora): use replaced_module_name in pooling model name check (#36402)
Signed-off-by: gambletan <ethanchang32@gmail.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-11 03:11:27 -07:00
Itay Alroy
5353c9b016 platforms: Fix Ray DP startup crash (#36665)
Signed-off-by: Itay Alroy <ialroy@nvidia.com>
2026-03-11 03:08:55 -07:00
Angela Yi
13e79fc811 [ci] Update rtol for test_classification (#36556)
Signed-off-by: angelayi <yiangela7@gmail.com>
Co-authored-by: Richard Zou <zou3519@users.noreply.github.com>
2026-03-11 03:08:16 -07:00
Rahul Tuli
9d07a3d6e4 Add: Eagle3 support for Qwen3.5 (#36658)
Signed-off-by: Rahul-Tuli <rtuli@redhat.com>
2026-03-11 03:07:42 -07:00
Cyrus Leung
646b85544b [Refactor] Remove Molmo2 processor wrapper (#36667)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2026-03-11 03:07:20 -07:00
tc-mb
4286cc5ec2 fix(minicpmv): fix audio inference by handling meta device in init_re… (#36751)
Signed-off-by: caitianchi <caitianchi@modelbest.cn>
2026-03-11 03:06:28 -07:00
LoganJane
545d18d81b [Bugfix] Support other quantization methods in glm41v (#36321)
Signed-off-by: g00887675/loganJane <g00887675/loganJane73@hotmail.com>
Co-authored-by: g00887675/loganJane <g00887675/loganJane73@hotmail.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2026-03-11 09:48:05 +00:00
roikoren755
e661b9ee83 [NemotronH] Small fix reasoning parser (#36635)
Signed-off-by: Roi Koren <roik@nvidia.com>
2026-03-11 02:44:41 -07:00
YiSheng5
c910eeb125 [XPU]Bug fix for some unexpected error when use AgRs backend on XPU device. (#36593)
Signed-off-by: yisheng <yi.sheng@intel.com>
2026-03-11 09:17:46 +00:00
Harry Mellor
f4ae58b38b Remove unused config field from Gemma2 (#36672)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2026-03-11 01:51:19 -07:00
Isotr0py
e568cf88bc [UX] Infer dtype for local checkpoint (#36218)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2026-03-11 08:50:04 +00:00
Nicolò Lucchesi
098d844731 [NIXL][1/N] Refactor kernel_block_size detection (#35752)
Signed-off-by: NickLucche <nlucches@redhat.com>
2026-03-11 01:11:23 -07:00
JartX
a40ee486f2 [Bugfix] Add Multiple of 16 block_size to triton fallback on rocm Attention to support qwen3_5 (#35923)
Signed-off-by: JartX <sagformas@epdcenter.es>
Co-authored-by: akaratza <akaratza@amd.com>
Co-authored-by: TJian <tunjian.tan@embeddedllm.com>
2026-03-11 07:45:57 +00:00
pschlan-amd
eac2dc2b41 AITER MLA backend: Avoid CPU sync in _build_decode (#35765)
Signed-off-by: Patrick Schlangen <pschlan@amd.com>
2026-03-11 07:25:00 +00:00
Flora Feng
d5080aeaa4 [Refactor] Remove deadcode in Responses API serving (#36726)
Signed-off-by: sfeng33 <4florafeng@gmail.com>
Co-authored-by: Signed-off-by: yewentao256 <zhyanwentao@126.com>
2026-03-11 07:11:41 +00:00
liuzhenwei
f22d6e0267 [Hardware][NIXL] set default kv buffer type for different platform (#36438)
Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com>
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
2026-03-11 05:19:28 +00:00
Kunshang Ji
76c6e6da08 [XPU] Support block fp8 moe by fallback to TritonExpert on XPU (#36458)
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
2026-03-10 21:54:09 -07:00
typer-J
4184653775 feat: add RISC-V support for CPU backend (v2) (#36578)
Signed-off-by: typer-J <2236066784@qq.com>
Co-authored-by: Li, Jiang <jiang1.li@intel.com>
2026-03-10 21:51:39 -07:00
Sladyn
4aaaf8c8ce feat(spec_decode): fuse EAGLE step slot mapping and metadata updates (#33503)
Signed-off-by: sladynnunes <snunes@usc.edu>
2026-03-11 04:35:33 +00:00
Hongbin Guo
4bf533623b [Doc] Fix duplicate words in comments (#36713)
Signed-off-by: Hongbin10 <jdmjdm1998@163.com>
2026-03-10 21:28:31 -07:00
Matthew Bonanni
5f77ef15ae [Misc][Attention] Clean up unused method in CPU_ATTN (#36673)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
2026-03-10 21:27:22 -07:00
elvischenv
7d6abdd022 [Fix] Use torch.empty for output in attention+quant fusion (#31785)
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
2026-03-10 21:26:14 -07:00
Wentao Ye
a8ff2cca92 [Perf] Optimize scheduler overhead for PD disaggregation, around 5% E2E perf improvement (#35781)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: Or Ozeri <oro@il.ibm.com>
2026-03-10 21:25:30 -07:00
tunglinwood
42fadebecb [Model] Add support for moonshotai/Kimi-Audio-7B-Instruct (#36127)
Signed-off-by: tunglinwood <tunglinwood@gmail.com>
Signed-off-by: tunglinwood <tomwu.tunglin@gmail.com>
Signed-off-by: tunglinwood <113751333+tunglinwood@users.noreply.github.com>
2026-03-10 21:24:48 -07:00
tianshu-Michael-yu
a197eda9c3 Add tuned H100 MoE configs for LFM2 8B and 24B (#36699) 2026-03-10 21:22:02 -07:00
Kevin H. Luu
82b110d50e [ci] Bound nvidia-cudnn-frontend version (#36719)
Signed-off-by: khluu <khluu000@gmail.com>
2026-03-11 12:17:35 +08:00
Benjamin Chislett
9040cd40af [DSV3.2][MTP] Optimize Indexer MTP handling (#36723)
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>
2026-03-11 12:16:56 +08:00
fangyuchu
fa0d353acf [Bugfix] Surface exceptions from non-blocking execute_model in UniProcExecutor to avoid DP deadlocks (#35194)
Signed-off-by: fangyuchu <fangyuchu@qq.com>
2026-03-11 03:22:21 +00:00
Augusto Yao
b386bb3d7c fix bugs when token_classify & classify run concurrently (#36614)
Signed-off-by: augusto.yjh <augusto.yjh@antgroup.com>
2026-03-10 20:16:34 -07:00
Ning Xie
fe714dd507 [openapi server] log exception in exception handler(2/N) (#36201)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
2026-03-10 20:16:30 -07:00
Matthew Bonanni
8ab3d7427c [Bugfix] Fix DeepSeek V3.2 OOM during CG memory profiling (#36691)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
2026-03-11 03:01:07 +00:00
Wei Zhao
84e436ed1c [Bug] Fix TRTLLM Block FP8 MoE Monolithic (#36296)
Signed-off-by: wzhao18 <wzhao18.sz@gmail.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
2026-03-10 22:04:47 -04:00
Andreas Karatzas
81939e7733 [ROCm][CI] Making some tests optional to reduce workload (#36090)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-03-10 16:45:27 -07:00
Woosuk Kwon
195d1ca3e8 [Minor] Enhance error message for TRTLLM decode uniformity check (#36609)
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>
2026-03-10 15:38:45 -07:00
Nick Hill
8d983d7cd6 [Model Runner V2] Add initial CI tests (#36041)
Signed-off-by: Nick Hill <nickhill123@gmail.com>
2026-03-10 14:55:21 -07:00
Nick Hill
65b2f405dc [Core] Simplify core kv-cache blocks initialization logic (#36521)
Signed-off-by: Nick Hill <nickhill123@gmail.com>
2026-03-10 20:20:02 +00:00
Nick Hill
2a68464c5b [Test] test_async_scheduling.py improvements (#36340)
Signed-off-by: Nick Hill <nickhill123@gmail.com>
2026-03-10 11:17:26 -07:00
Zhengxu Chen
bdd8981dab [compile] Apply stored functorch config while finalizing loaded artifacts. (#36582)
Signed-off-by: zhxchen17 <zhxchen17@fb.com>
2026-03-10 09:34:35 -07:00
Woosuk Kwon
f088a831dd [Model Runner V2] Use unpadded num_tokens for PW CUDA graph attn metadata (#36626)
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>
2026-03-10 09:30:56 -07:00