Commit Graph

14592 Commits

Author SHA1 Message Date
nvnbagrov
b7332b058c [Model] Nano Nemotron VL - fast media preprocessing (#35657)
Signed-off-by: Natan Bagrov <nbagrov@nvidia.com>
2026-03-08 03:04:05 -07:00
Andreas Karatzas
40077ea3de [CI] fix flaky empty responses and add diagnostic assertions in vision chat tests (#36341)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-03-08 14:42:24 +08:00
Samuel Shen
5d6aae4577 [LMCache MP Patch]: Race Condition + Duplicated Block Ids (#35831) 2026-03-07 13:52:48 -08:00
Roy Huang
63298ee173 [Bugfix][LMCache][KVConnector] fix potential memory leak in LMCache multiprocess mode (#35931) 2026-03-07 13:52:35 -08:00
Richard Zou
2dde535df1 [compile] Split compile/warmup monitoring (#36098) 2026-03-07 13:52:11 -08:00
Wei Zhao
379689d533 [Perf] Support FP8 KV cache for Flashinfer MLA Sparse (#35891) 2026-03-07 13:51:54 -08:00
PatchyTIS
a6be75dbd2 [Core] NGram GPU Implementation compatible with Async Scheduler (#29184) 2026-03-07 13:51:37 -08:00
Micah Williamson
ee54f9cdb9 [ROCm][CI] Accept Different But Valid Output for test_olmoe_tp (#35224) 2026-03-07 13:50:52 -08:00
Micah Williamson
fc4657756f [ROCm][CI] Enable AITER for failing test_gpt_oss test case on MI355 (#36174) 2026-03-07 13:50:17 -08:00
qli88
eebd14651f [CI] Enable Crosslayer KV layout tests for ROCm platforms (#35416) 2026-03-07 13:49:56 -08:00
Matthew Bonanni
ebb9cc5f2b [UX][Startup] Account for CUDA graphs during memory profiling (#30515) 2026-03-07 13:49:23 -08:00
rahul-sarvam
85f50eb41f Adding support to Sarvam's MoE models (#33942)
Signed-off-by: rahul-sarvam <140298821+rahul-sarvam@users.noreply.github.com>
2026-03-08 01:16:24 +08:00
Taneem Ibrahim
5261223c2d [Misc] Remove duplicate parser registration (#36303)
Signed-off-by: Taneem Ibrahim <taneem.ibrahim@gmail.com>
2026-03-07 09:37:01 -05:00
lif
00b814ba5a [V0 Deprecation] Remove unused swap_space parameter (#36216)
Signed-off-by: majiayu000 <1835304752@qq.com>
Co-authored-by: mcelrath
2026-03-07 22:09:55 +08:00
vllmellm
ee8a29511f [Bugfix] Fix compressed-tensors quantization failure for DeepSeek-R1 on MI300x (#36247)
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
2026-03-07 09:26:59 +00:00
milesial
755356b3d1 feat: expose media_io_kwargs at runtime (#34778)
Signed-off-by: Alexandre Milesi <milesial@users.noreply.github.com>
2026-03-07 04:27:04 +00:00
Andreas Karatzas
58928475e4 [ROCm][CI] Making entrypoints more deterministic on ROCm (#36293) 2026-03-06 19:04:40 -08:00
Mengtao (Martin) Yuan
1a9718085c Fix CUDA graph decode capture crash in AITER FlashAttention (#36042)
Signed-off-by: Martin Yuan <myuan@meta.com>
Co-authored-by: Martin Yuan <myuan@meta.com>
2026-03-06 18:12:07 -08:00
Kunshang Ji
7eb524e64c refine vllm bench throughput --backend hf (#35971)
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
2026-03-07 02:10:33 +00:00
Nick Hill
c7f32e08c2 [BugFix] Avoid ignored trust_remote_code warnings (#36290)
Signed-off-by: Nick Hill <nickhill123@gmail.com>
2026-03-07 01:24:18 +00:00
Nick Hill
b354686524 [Model Runner V2] Fix warmup for pipeline parallel (#36280)
Signed-off-by: Nick Hill <nickhill123@gmail.com>
2026-03-06 16:58:51 -08:00
Nick Hill
6a18d8789b [Core] Fix benign error log during normal shutdown (#36270)
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Co-authored-by: Mark McLoughlin <markmc@redhat.com>
2026-03-07 00:39:21 +00:00
Itay Alroy
24a03915f5 mla: don't update kv cache on dummy forwards (#36282)
Signed-off-by: Itay Alroy <ialroy@nvidia.com>
2026-03-07 00:36:00 +00:00
Andreas Karatzas
b5e34e1fca [ROCm][CI] Fixing yaml file for external amd-ci signal (#36284)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-03-06 18:30:39 -06:00
Copilot
ce8546a12b [docs][torch.compile] Add fusions.md — kernel/operator fusion reference page (#35538)
Signed-off-by: ProExpertProg <luka.govedic@gmail.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: ProExpertProg <11367180+ProExpertProg@users.noreply.github.com>
Co-authored-by: ProExpertProg <luka.govedic@gmail.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
2026-03-06 23:55:06 +00:00
Chuan (Richard) Li
c188749bcd [ROCm] Support MLA with nhead<16 and FP8 KV cache for TP=8 (Kimi K2.5/Linear) (#35850)
Signed-off-by: Li <chuali@amd.com>
2026-03-06 20:24:03 +00:00
Alexei-V-Ivanov-AMD
225d1090a0 Enabling some B200-specific tests on MI355 (#35253)
Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com>
Signed-off-by: Alexei-V-Ivanov-AMD <156011006+Alexei-V-Ivanov-AMD@users.noreply.github.com>
2026-03-06 19:27:20 +00:00
eellison
f3c6c9c9d7 [CustomOp] CustomOp FusedRMSNormGated (#35877)
Signed-off-by: Elias Ellison <elias.ellison@gmail.com>
Signed-off-by: eellison <elias.ellison@gmail.com>
2026-03-06 10:53:37 -08:00
Nick Hill
26bd43b52d Revert "[BugFix] Fix engine hanging after KV cache initialization fai… (#36262) 2026-03-06 08:28:09 -08:00
Travis Johnson
6b625a8807 [Bugfix] Quickfix followups to busy loop removal in #28053 (#36068)
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Co-authored-by: Nick Hill <nickhill123@gmail.com>
2026-03-06 08:13:05 -08:00
Richard Zou
54756b6109 [compile] Stop unconditionally patching constrain_to_fx_strides (#36152)
Signed-off-by: Richard Zou <zou3519@gmail.com>
2026-03-06 10:17:27 -05:00
Raphaël Rialland
39f9ea0da4 [Bugfix] Fix cudagraph_mode:FULL dispatch (This does not impact FULL_AND_PIECEWISE (default)) (#36165) 2026-03-06 09:15:31 -05:00
Isotr0py
e4ae148a78 [Refactor] Modular video loader backend refactoring (#35202)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2026-03-06 06:06:59 -08:00
Isotr0py
1d0c0d209c [Misc] Lazy import registered processors (#36024)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Roger Wang <hey@rogerw.io>
2026-03-06 06:06:45 -08:00
Chenguang Zheng
fcb73f306c [bugfix] add api process rank in default multimodal request (#36150)
Signed-off-by: fake0fan <645327136@qq.com>
Signed-off-by: Chenguang ZHENG <645327136@qq.com>
2026-03-06 12:00:09 +00:00
Harry Mellor
e2090bf3af [CI] Fix startup error test (#36230)
A change in engine startup error messages in #35478 caused this test failure.

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2026-03-06 11:50:28 +00:00
Andreas Karatzas
2a00d3241f [CI][MM] Gate vision encoder attention mask to MiniCPM only, fixing Aria regression (#36206)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-03-06 01:17:08 -08:00
Alex Brooks
10f4db4dbe [Frontend] Add Support for MM Encoder/Decoder Beam Search (Offline) (#36153)
Signed-off-by: Alex Brooks <albrooks@redhat.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-03-06 01:16:56 -08:00
Nicolò Lucchesi
5b3ba94ab4 [Core][KVConnector] Support HMA+NixlConnector (#35758)
Signed-off-by: NickLucche <nlucches@redhat.com>
2026-03-06 08:51:21 +01:00
zhanqiuhu
90f3c01fa4 [Spec Decode][KV Connector] Fix KV transfer in PD + speculative decoding (#35158)
Signed-off-by: Claude <noreply@anthropic.com>
Signed-off-by: Zhanqiu Hu <zh338@cornell.edu>
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>
2026-03-06 08:50:44 +01:00
Andreas Karatzas
807d680337 [ROCm][CI] Fix tool use test stability - disable skinny GEMM, prefix caching, eliminate batch variance (#35553)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-03-06 15:15:12 +08:00
Tyler Michael Smith
5afb387bd4 Change "following fields were present in the request but ignored" log from warn to debug (#36173)
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
2026-03-05 22:15:46 -08:00
Walter Beller-Morales
43e77e59ab [BugFix] avoid infinite loop with VLLM_PORT and get_open_ports_list (#36191)
Signed-off-by: walterbm <walter.beller.morales@gmail.com>
2026-03-05 22:15:29 -08:00
Russell Bryant
00bd08edee [Security] Respect user trust_remote_code setting in NemotronVL and KimiK25 (#36192)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
2026-03-05 22:15:19 -08:00
Ajay Anubolu
43f10573c9 [Bugfix] Fix misleading context length error messages (#36197)
Signed-off-by: AjAnubolu <anuboluajay@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-05 22:15:12 -08:00
Yongye Zhu
86e1060b17 [Bugfix] Fix inner_dp_world initialization order for multi-node TP (#35892)
Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Co-authored-by: Nick Hill <nickhill123@gmail.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
2026-03-05 22:04:44 -08:00
Mark McLoughlin
27066d1b2b [Frontend][Core] Add shutdown timeout - allowing in-flight requests to finish (#34730)
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-03-05 22:04:31 -08:00
cong-or
57c84ff129 perf: add __slots__ to KVCacheBlock (#36164)
Signed-off-by: cong-or <conchubhar.gannon@gmail.com>
2026-03-05 22:04:09 -08:00
Xiang Shi
e68de8adc0 docs: fix wrong cc in int8.md (#36209)
Signed-off-by: Xiang Shi <realkevin@tutanota.com>
2026-03-06 06:01:02 +00:00
Andreas Karatzas
a1ffa56a1e [CI] Fix bge-m3 similarity reference values after *Defination* typo fix (#36208)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-03-06 05:07:29 +00:00