Ben Browning
|
cb0b443274
|
[Misc] Add 20 regression tests for 11 tool parser bug fixes (#38172)
Signed-off-by: Ben Browning <bbrownin@redhat.com>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
|
2026-04-01 03:00:31 +00:00 |
|
Luka Govedič
|
40bb175027
|
[vLLM IR] 1/N Implement IR skeleton and rms_norm op (#33825)
Signed-off-by: Luka Govedič <lgovedic@redhat.com>
Signed-off-by: Xinyu Chen <xinyu1.chen@intel.com>
Signed-off-by: chzhang <chaojun.zhang@intel.com>
Signed-off-by: Luka Govedic <luka.govedic@gmail.com>
Co-authored-by: Xinyu Chen <xinyu1.chen@intel.com>
Co-authored-by: Chaojun Zhang <chaojun.zhang@intel.com>
Co-authored-by: Luka Govedič <ProExpertProg@h100-01.nemg-001.lab.rdu2.dc.redhat.com>
|
2026-03-31 22:15:05 -04:00 |
|
Yifan Qiao
|
91e4521f9f
|
[Feat][v1] Simple yet General CPU KV Cache Offloading (#37160)
Signed-off-by: Yifan Qiao <yifanqiao@berkeley.edu>
Signed-off-by: Yifan Qiao <yifanqiao@inferact.ai>
|
2026-03-31 17:58:37 -07:00 |
|
Vedant V Jhaveri
|
2e56975657
|
Generative Scoring (#34539)
Signed-off-by: Vedant Jhaveri <vjhaveri@linkedin.com>
Co-authored-by: Vedant Jhaveri <vjhaveri@linkedin.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2026-03-31 16:02:11 -07:00 |
|
Yanan Cao
|
cc671cb110
|
[Kernel] [Helion] [17/N] Add Helion kernel torch.compile support (#38592)
Signed-off-by: Yanan Cao <gmagogsfm@gmail.com>
Co-authored-by: Claude Sonnet 4 <noreply@anthropic.com>
|
2026-03-31 17:06:42 -04:00 |
|
Wentao Ye
|
856589ed9a
|
[Refactor] Remove dead code in kv connector and model runner (#38383)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2026-03-31 17:05:23 -04:00 |
|
yzong-rh
|
d9b90a07ac
|
[MoE Refactor] Migrate Unquantized to Full Oracle Flow (#36286)
Signed-off-by: Yifan Zong <yzong@redhat.com>
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Signed-off-by: yzong-rh <yzong@redhat.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
|
2026-03-31 15:43:33 -04:00 |
|
Olya Kozlova
|
598190aac3
|
[fix] Remove trtllm ragged mla prefills (#36540)
Signed-off-by: Olya Kozlova <okozlova@nvidia.com>
|
2026-03-31 12:30:27 -07:00 |
|
BadrBasowid
|
077a9a8e37
|
[torch.compile] Refactor Attention Quant Fusion Pass and Remove Boilerplate (#37373)
Signed-off-by: BadrBasowid <badr.basowid@gmail.com>
Co-authored-by: vllmellm <vllm.ellm@embeddedllm.com>
|
2026-03-31 14:15:50 -04:00 |
|
SandishKumarHN
|
3896e021a0
|
[Bugfix] Fix FusedMoE weight loading with padded hidden dimensions (#37010)
Signed-off-by: SandishKumarHN <sandish@fb.com>
|
2026-03-31 12:22:26 -04:00 |
|
Matthew Bonanni
|
757068dc65
|
[Bugfix][Async] Fix async spec decoding with hybrid models (#38556)
Signed-off-by: SandishKumarHN <sandishkumarhn@gmail.com>
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Co-authored-by: SandishKumarHN <sandishkumarhn@gmail.com>
|
2026-03-31 11:08:54 -04:00 |
|
wliao2
|
4dfad17ed1
|
replace cuda_device_count_stateless() to current_platform.device_count() (#37841)
Signed-off-by: Liao, Wei <wei.liao@intel.com>
Signed-off-by: wliao2 <wei.liao@intel.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
|
2026-03-31 22:32:54 +08:00 |
|
Nicolò Lucchesi
|
7430389669
|
[Bugfix][CI] Skip flaky test_eagle test (#38566)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2026-03-31 09:42:37 -04:00 |
|
Jiangyun Zhu
|
ea7bfde6e4
|
[CI] fix LM Eval Qwen3.5 Models (B200) (#38632)
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
|
2026-03-31 13:20:08 +00:00 |
|
Ilya Markov
|
abdbb68386
|
[EPLB] Add alternative communication for EPLB weight exchange (#33176)
Signed-off-by: ilmarkov <markovilya197@gmail.com>
Signed-off-by: Markov Ilya <markovilya19@gmail.com>
Co-authored-by: Markov Ilya <markovilya19@gmail.com>
|
2026-03-31 08:17:12 -04:00 |
|
wang.yuqi
|
719735d6c5
|
[CI Failure] pin colmodernvbert revision (#38612)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Signed-off-by: wang.yuqi <noooop@126.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2026-03-31 10:54:54 +00:00 |
|
Matthew Bonanni
|
7d65463528
|
[WIP][CI][Bugfix] Fix test_run_eagle_dp (#38584)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
|
2026-03-31 12:30:25 +02:00 |
|
wang.yuqi
|
d9d21eb8e3
|
[Frontend][3/n] Improve pooling entrypoints | scoring. (#28631)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
|
2026-03-31 07:52:00 +00:00 |
|
Yintong Lu
|
f09daea261
|
[CPU] Support int8 compute mode in CPU AWQ (#35697)
Signed-off-by: Yintong Lu <yintong.lu@intel.com>
|
2026-03-31 15:27:37 +08:00 |
|
zhangyiming
|
1ac6694297
|
[OOT] Add OOT support for linear kernel. (#37989)
Signed-off-by: menogrey <1299267905@qq.com>
|
2026-03-31 14:33:21 +08:00 |
|
Flora Feng
|
d53cb9cb8e
|
[Tool Parser][2/3] Use self.tools instead of request.tools in tool parsers (#38189)
Signed-off-by: sfeng33 <4florafeng@gmail.com>
|
2026-03-31 13:41:36 +08:00 |
|
Andreas Karatzas
|
b9cdc85207
|
[ROCm][CI] Fix Whisper translation test attention backend selection (#38508)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-31 13:21:49 +08:00 |
|
SandishKumarHN
|
bcc6f67447
|
[Bugfix] Use null block (0) for padded block table entries (#35431)
Signed-off-by: SandishKumarHN <sandish@fb.com>
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
Co-authored-by: Matthew Bonanni <mbonanni@redhat.com>
|
2026-03-30 14:02:51 -07:00 |
|
Micah Williamson
|
d9c7db18da
|
[ROCm][CI] Pin test_hybrid test to TRITON_ATTN on ROCm (#38381)
Signed-off-by: Micah Williamson <micah.williamson@amd.com>
|
2026-03-30 20:26:46 +00:00 |
|
Ilya Markov
|
12701e8af2
|
[EPLB] Optmize eplb mapping and record in router for prefill (#36261)
Signed-off-by: ilmarkov <markovilya197@gmail.com>
|
2026-03-30 19:48:33 +00:00 |
|
Benjamin Chislett
|
494636b29d
|
[Feat][Spec Decode] DFlash (#36847)
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>
|
2026-03-30 15:03:15 -04:00 |
|
Chendi.Xue
|
3b1dbaad4e
|
[HMA]Fix corner case when hybrid page_size can not be evenly divided issue (blk_size=64,tp=4) (#37467)
Signed-off-by: Chendi Xue <chendi.xue@intel.com>
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: Chendi.Xue <chendi.xue@intel.com>
Co-authored-by: Matthew Bonanni <mbonanni@redhat.com>
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>
|
2026-03-30 16:47:30 +00:00 |
|
Johnny
|
b4a2f3ac36
|
[NVIDIA] Bugfix NVFP4 DGX Spark and RTX50 (#38423)
Signed-off-by: johnnynunez <johnnynuca14@gmail.com>
Signed-off-by: Johnny <johnnynuca14@gmail.com>
|
2026-03-30 09:36:18 -07:00 |
|
roikoren755
|
8e6293e838
|
[Mamba] Add stochastic rounding support (#35753)
Signed-off-by: Roi Koren <roik@nvidia.com>
|
2026-03-30 12:33:49 -04:00 |
|
Hongxia Yang
|
dbdd9ae067
|
[ROCm][Bugfix] fix exception related to trust_remote_code for MiniMax-M2.1-MXFP4 (#37698)
Signed-off-by: Hongxia Yang <hongxiay.yang@amd.com>
Co-authored-by: Hongxia Yang <hongxiay.yang@amd.com>
|
2026-03-30 15:49:23 +00:00 |
|
Matthias Gehre
|
e8b055a5ac
|
[Bugfix] Handle ParallelLMHead in compressed-tensors get_quant_method (#37291)
Signed-off-by: Matthias Gehre <matthias.gehre@amd.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
|
2026-03-30 07:30:52 -07:00 |
|
Andreas Karatzas
|
677424c7ac
|
[Core][CI] Add opt-in media URL caching via VLLM_MEDIA_CACHE (#37123)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-30 04:58:53 -07:00 |
|
Collin McCarthy
|
1031c84c36
|
Fix ambiguous num_blocks for hybrid attn mamba (#37236)
Signed-off-by: Collin McCarthy <cmccarthy@nvidia.com>
Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>
Co-authored-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>
|
2026-03-30 11:09:45 +00:00 |
|
aliialsaeedii
|
7e76af14fa
|
[Bugfix][Frontend] Return 400 for corrupt/truncated image inputs instead of 500 (#38253)
Signed-off-by: aliialsaeedii <ali.al-saeedi@nscale.com>
|
2026-03-30 10:26:46 +00:00 |
|
yzong-rh
|
3683fe6c06
|
[Bugfix] Fix shared-object aliasing in n>1 streaming with tool calls (#38158)
Signed-off-by: Yifan Zong <yzong@redhat.com>
Signed-off-by: Yifan <yzong@redhat.com>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
|
2026-03-30 10:12:13 +00:00 |
|
Nicolò Lucchesi
|
cc06b4e86b
|
[Mamba][Bugfix] Raise on insufficient cache blocks instead of silently capping cudagraph sizes (#38270)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2026-03-30 09:41:50 +00:00 |
|
haosdent
|
a08b7733fd
|
[CI] Fix SPLADE pooler test broken by #38139 (#38495)
Signed-off-by: haosdent <haosdent@gmail.com>
|
2026-03-30 07:48:33 +00:00 |
|
Juan Pérez de Algaba
|
57861ae48d
|
(security) Fix SSRF in batch runner download_bytes_from_url (#38482)
Signed-off-by: jperezde <jperezde@redhat.com>
|
2026-03-30 07:10:01 +00:00 |
|
Andreas Karatzas
|
bea23536f6
|
[CI] Add temperature=0.0, reduce max_tokens, and add debug prints to audio_in_video tests (#38492)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-30 05:36:45 +00:00 |
|
Stanislav Kirillov
|
a6db99ba02
|
[Bugfix] Support multi-type params parsing for DeepSeek v3.2 (#33703)
Signed-off-by: Stanislav Kirillov <stas@nebius.com>
Co-authored-by: Stanislav Kirillov <stas@nebius.com>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
|
2026-03-30 04:07:28 +00:00 |
|
Andreas Karatzas
|
4f2ed5fddb
|
[ROCm][CI] Enable hybrid chunked prefill test (#38317)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-30 10:30:26 +08:00 |
|
Kyle Sayers
|
d28d86e8a3
|
[QeRL] Fix online quantized reloading (#38442)
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
|
2026-03-29 14:56:41 -06:00 |
|
Wentao Ye
|
995dea1354
|
[Perf] Remove redundant device copies for CPU-only pooling token IDs, 48.9% E2E throughput improvement (#38139)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2026-03-29 18:12:50 +00:00 |
|
Andreas Karatzas
|
43cc5138e5
|
[ROCm][CI] Fix cross-attention dispatch for encoder-decoder models (#38450)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-28 22:08:03 -07:00 |
|
haosdent
|
d39b8daf5f
|
[Feature] Add Qwen3-ForcedAligner support via token classification pooling (#35367)
Signed-off-by: haosdent <haosdent@gmail.com>
|
2026-03-29 00:27:52 +00:00 |
|
Walter Beller-Morales
|
fafca38adc
|
[BugFix][Frontend] apply task instruction as system prompt in cohere v2/embed (#38362)
Signed-off-by: walterbm <walter.beller.morales@gmail.com>
|
2026-03-28 18:30:54 +00:00 |
|
haosdent
|
b2bc736b12
|
[CI] Fix Ernie4.5-VL initialization test (#38429)
Signed-off-by: haosdent <haosdent@gmail.com>
|
2026-03-28 22:43:24 +08:00 |
|
Bvicii
|
bda3eda82d
|
[Bugfix] Disallow renderer_num_workers > 1 with mm processor cache (#38418)
Signed-off-by: Bvicii <yizhanhuang2002@gmail.com>
|
2026-03-28 06:32:52 -07:00 |
|
yzong-rh
|
6dad4c5722
|
[Test] Fix flaky race condition in test_abort_final_step (#38414)
Signed-off-by: Yifan <yzong@redhat.com>
|
2026-03-28 09:06:56 +00:00 |
|
Nicolò Lucchesi
|
44a6528028
|
[CI] Skip failing test (#38369)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2026-03-27 13:25:19 -07:00 |
|