5112 Commits

Author SHA1 Message Date
Matthew Bonanni
757068dc65 [Bugfix][Async] Fix async spec decoding with hybrid models (#38556)
Signed-off-by: SandishKumarHN <sandishkumarhn@gmail.com>
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Co-authored-by: SandishKumarHN <sandishkumarhn@gmail.com>
2026-03-31 11:08:54 -04:00
wliao2
4dfad17ed1 replace cuda_device_count_stateless() to current_platform.device_count() (#37841)
Signed-off-by: Liao, Wei <wei.liao@intel.com>
Signed-off-by: wliao2 <wei.liao@intel.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
2026-03-31 22:32:54 +08:00
Nicolò Lucchesi
7430389669 [Bugfix][CI] Skip flaky test_eagle test (#38566)
Signed-off-by: NickLucche <nlucches@redhat.com>
2026-03-31 09:42:37 -04:00
Jiangyun Zhu
ea7bfde6e4 [CI] fix LM Eval Qwen3.5 Models (B200) (#38632)
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
2026-03-31 13:20:08 +00:00
Ilya Markov
abdbb68386 [EPLB] Add alternative communication for EPLB weight exchange (#33176)
Signed-off-by: ilmarkov <markovilya197@gmail.com>
Signed-off-by: Markov Ilya <markovilya19@gmail.com>
Co-authored-by: Markov Ilya <markovilya19@gmail.com>
2026-03-31 08:17:12 -04:00
wang.yuqi
719735d6c5 [CI Failure] pin colmodernvbert revision (#38612)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Signed-off-by: wang.yuqi <noooop@126.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-03-31 10:54:54 +00:00
Matthew Bonanni
7d65463528 [WIP][CI][Bugfix] Fix test_run_eagle_dp (#38584)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
2026-03-31 12:30:25 +02:00
wang.yuqi
d9d21eb8e3 [Frontend][3/n] Improve pooling entrypoints | scoring. (#28631)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
2026-03-31 07:52:00 +00:00
Yintong Lu
f09daea261 [CPU] Support int8 compute mode in CPU AWQ (#35697)
Signed-off-by: Yintong Lu <yintong.lu@intel.com>
2026-03-31 15:27:37 +08:00
zhangyiming
1ac6694297 [OOT] Add OOT support for linear kernel. (#37989)
Signed-off-by: menogrey <1299267905@qq.com>
2026-03-31 14:33:21 +08:00
Flora Feng
d53cb9cb8e [Tool Parser][2/3] Use self.tools instead of request.tools in tool parsers (#38189)
Signed-off-by: sfeng33 <4florafeng@gmail.com>
2026-03-31 13:41:36 +08:00
Andreas Karatzas
b9cdc85207 [ROCm][CI] Fix Whisper translation test attention backend selection (#38508)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-03-31 13:21:49 +08:00
SandishKumarHN
bcc6f67447 [Bugfix] Use null block (0) for padded block table entries (#35431)
Signed-off-by: SandishKumarHN <sandish@fb.com>
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
Co-authored-by: Matthew Bonanni <mbonanni@redhat.com>
2026-03-30 14:02:51 -07:00
Micah Williamson
d9c7db18da [ROCm][CI] Pin test_hybrid test to TRITON_ATTN on ROCm (#38381)
Signed-off-by: Micah Williamson <micah.williamson@amd.com>
2026-03-30 20:26:46 +00:00
Ilya Markov
12701e8af2 [EPLB] Optmize eplb mapping and record in router for prefill (#36261)
Signed-off-by: ilmarkov <markovilya197@gmail.com>
2026-03-30 19:48:33 +00:00
Benjamin Chislett
494636b29d [Feat][Spec Decode] DFlash (#36847)
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>
2026-03-30 15:03:15 -04:00
Chendi.Xue
3b1dbaad4e [HMA]Fix corner case when hybrid page_size can not be evenly divided issue (blk_size=64,tp=4) (#37467)
Signed-off-by: Chendi Xue <chendi.xue@intel.com>
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: Chendi.Xue <chendi.xue@intel.com>
Co-authored-by: Matthew Bonanni <mbonanni@redhat.com>
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>
2026-03-30 16:47:30 +00:00
Johnny
b4a2f3ac36 [NVIDIA] Bugfix NVFP4 DGX Spark and RTX50 (#38423)
Signed-off-by: johnnynunez <johnnynuca14@gmail.com>
Signed-off-by: Johnny <johnnynuca14@gmail.com>
2026-03-30 09:36:18 -07:00
roikoren755
8e6293e838 [Mamba] Add stochastic rounding support (#35753)
Signed-off-by: Roi Koren <roik@nvidia.com>
2026-03-30 12:33:49 -04:00
Hongxia Yang
dbdd9ae067 [ROCm][Bugfix] fix exception related to trust_remote_code for MiniMax-M2.1-MXFP4 (#37698)
Signed-off-by: Hongxia Yang <hongxiay.yang@amd.com>
Co-authored-by: Hongxia Yang <hongxiay.yang@amd.com>
2026-03-30 15:49:23 +00:00
Matthias Gehre
e8b055a5ac [Bugfix] Handle ParallelLMHead in compressed-tensors get_quant_method (#37291)
Signed-off-by: Matthias Gehre <matthias.gehre@amd.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
2026-03-30 07:30:52 -07:00
Andreas Karatzas
677424c7ac [Core][CI] Add opt-in media URL caching via VLLM_MEDIA_CACHE (#37123)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-03-30 04:58:53 -07:00
Collin McCarthy
1031c84c36 Fix ambiguous num_blocks for hybrid attn mamba (#37236)
Signed-off-by: Collin McCarthy <cmccarthy@nvidia.com>
Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>
Co-authored-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>
2026-03-30 11:09:45 +00:00
aliialsaeedii
7e76af14fa [Bugfix][Frontend] Return 400 for corrupt/truncated image inputs instead of 500 (#38253)
Signed-off-by: aliialsaeedii <ali.al-saeedi@nscale.com>
2026-03-30 10:26:46 +00:00
yzong-rh
3683fe6c06 [Bugfix] Fix shared-object aliasing in n>1 streaming with tool calls (#38158)
Signed-off-by: Yifan Zong <yzong@redhat.com>
Signed-off-by: Yifan <yzong@redhat.com>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
2026-03-30 10:12:13 +00:00
Nicolò Lucchesi
cc06b4e86b [Mamba][Bugfix] Raise on insufficient cache blocks instead of silently capping cudagraph sizes (#38270)
Signed-off-by: NickLucche <nlucches@redhat.com>
2026-03-30 09:41:50 +00:00
haosdent
a08b7733fd [CI] Fix SPLADE pooler test broken by #38139 (#38495)
Signed-off-by: haosdent <haosdent@gmail.com>
2026-03-30 07:48:33 +00:00
Juan Pérez de Algaba
57861ae48d (security) Fix SSRF in batch runner download_bytes_from_url (#38482)
Signed-off-by: jperezde <jperezde@redhat.com>
2026-03-30 07:10:01 +00:00
Andreas Karatzas
bea23536f6 [CI] Add temperature=0.0, reduce max_tokens, and add debug prints to audio_in_video tests (#38492)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-03-30 05:36:45 +00:00
Stanislav Kirillov
a6db99ba02 [Bugfix] Support multi-type params parsing for DeepSeek v3.2 (#33703)
Signed-off-by: Stanislav Kirillov <stas@nebius.com>
Co-authored-by: Stanislav Kirillov <stas@nebius.com>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
2026-03-30 04:07:28 +00:00
Andreas Karatzas
4f2ed5fddb [ROCm][CI] Enable hybrid chunked prefill test (#38317)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-03-30 10:30:26 +08:00
Kyle Sayers
d28d86e8a3 [QeRL] Fix online quantized reloading (#38442)
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
2026-03-29 14:56:41 -06:00
Wentao Ye
995dea1354 [Perf] Remove redundant device copies for CPU-only pooling token IDs, 48.9% E2E throughput improvement (#38139)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2026-03-29 18:12:50 +00:00
Andreas Karatzas
43cc5138e5 [ROCm][CI] Fix cross-attention dispatch for encoder-decoder models (#38450)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-03-28 22:08:03 -07:00
haosdent
d39b8daf5f [Feature] Add Qwen3-ForcedAligner support via token classification pooling (#35367)
Signed-off-by: haosdent <haosdent@gmail.com>
2026-03-29 00:27:52 +00:00
Walter Beller-Morales
fafca38adc [BugFix][Frontend] apply task instruction as system prompt in cohere v2/embed (#38362)
Signed-off-by: walterbm <walter.beller.morales@gmail.com>
2026-03-28 18:30:54 +00:00
haosdent
b2bc736b12 [CI] Fix Ernie4.5-VL initialization test (#38429)
Signed-off-by: haosdent <haosdent@gmail.com>
2026-03-28 22:43:24 +08:00
Bvicii
bda3eda82d [Bugfix] Disallow renderer_num_workers > 1 with mm processor cache (#38418)
Signed-off-by: Bvicii <yizhanhuang2002@gmail.com>
2026-03-28 06:32:52 -07:00
yzong-rh
6dad4c5722 [Test] Fix flaky race condition in test_abort_final_step (#38414)
Signed-off-by: Yifan <yzong@redhat.com>
2026-03-28 09:06:56 +00:00
Nicolò Lucchesi
44a6528028 [CI] Skip failing test (#38369)
Signed-off-by: NickLucche <nlucches@redhat.com>
2026-03-27 13:25:19 -07:00
Kyle Sayers
648edcf729 [QeRL] Compose online quantization with quantized reloading (#38032)
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
2026-03-27 13:22:33 -07:00
Jonas M. Kübler
98e7f223b9 enable skipping of SW attention layers when using FP8 KV cache (#33695)
Signed-off-by: Jonas Kuebler <kuebj@amazon.com>
2026-03-27 07:25:02 -06:00
Juan Pérez de Algaba
b111f8a61f fix(security): Add VLLM_MAX_N_SEQUENCES environment variable and enforce limit (#37952)
Signed-off-by: jperezde <jperezde@redhat.com>
Signed-off-by: Russell Bryant <rbryant@redhat.com>
Co-authored-by: Russell Bryant <rbryant@redhat.com>
2026-03-27 09:02:10 -04:00
Sage Moore
497e234d38 [EPLB] Cleanup the transfer logic for the various eplb maps (#34520)
Signed-off-by: Sage Moore <sagmoore@redhat.com>
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2026-03-27 10:18:46 +01:00
dtc
6287e7fa20 [P/D] Mooncake: Add unit tests and minor fixes for mooncake connector (#36946)
Signed-off-by: Tianchen Ding <dtcccc@linux.alibaba.com>
2026-03-27 09:26:40 +01:00
Flora Feng
aee4c14689 [Bugfix] Fix Hermes tool parser when stream interval > 1 (#38168)
Signed-off-by: sfeng33 <4florafeng@gmail.com>
2026-03-27 14:42:26 +08:00
Li, Jiang
becaed6ec8 [CPU] Support CT W4A16 on CPU MP kernel (#38219)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
2026-03-27 14:15:28 +08:00
Or Ozeri
7cc302dd87 [kv_offload+HMA][7/N]: Support register_kv_caches for hybrid models (#37853)
Signed-off-by: Or Ozeri <oro@il.ibm.com>
2026-03-27 08:38:33 +03:00
Bvicii
999dfc1622 [Bugfix] Offload blocking tokenizer ops to shared thread pool to unblock event loop (#34789)
Signed-off-by: Bvicii <yizhanhuang2002@gmail.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2026-03-26 22:17:00 -07:00
Giancarlo Delfin
c32e97602d [Model Runner V2] Enable forcing a specific acceptance rate during rejection sampling (#38045)
Signed-off-by: Giancarlo Delfin <gdelfin@inferact.ai>
2026-03-26 13:38:12 -07:00