Nicolò Lucchesi
|
cc06b4e86b
|
[Mamba][Bugfix] Raise on insufficient cache blocks instead of silently capping cudagraph sizes (#38270)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2026-03-30 09:41:50 +00:00 |
|
haosdent
|
a08b7733fd
|
[CI] Fix SPLADE pooler test broken by #38139 (#38495)
Signed-off-by: haosdent <haosdent@gmail.com>
|
2026-03-30 07:48:33 +00:00 |
|
Juan Pérez de Algaba
|
57861ae48d
|
(security) Fix SSRF in batch runner download_bytes_from_url (#38482)
Signed-off-by: jperezde <jperezde@redhat.com>
|
2026-03-30 07:10:01 +00:00 |
|
Andreas Karatzas
|
bea23536f6
|
[CI] Add temperature=0.0, reduce max_tokens, and add debug prints to audio_in_video tests (#38492)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-30 05:36:45 +00:00 |
|
Stanislav Kirillov
|
a6db99ba02
|
[Bugfix] Support multi-type params parsing for DeepSeek v3.2 (#33703)
Signed-off-by: Stanislav Kirillov <stas@nebius.com>
Co-authored-by: Stanislav Kirillov <stas@nebius.com>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
|
2026-03-30 04:07:28 +00:00 |
|
Andreas Karatzas
|
4f2ed5fddb
|
[ROCm][CI] Enable hybrid chunked prefill test (#38317)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-30 10:30:26 +08:00 |
|
Kyle Sayers
|
d28d86e8a3
|
[QeRL] Fix online quantized reloading (#38442)
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
|
2026-03-29 14:56:41 -06:00 |
|
Wentao Ye
|
995dea1354
|
[Perf] Remove redundant device copies for CPU-only pooling token IDs, 48.9% E2E throughput improvement (#38139)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2026-03-29 18:12:50 +00:00 |
|
Andreas Karatzas
|
43cc5138e5
|
[ROCm][CI] Fix cross-attention dispatch for encoder-decoder models (#38450)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-28 22:08:03 -07:00 |
|
haosdent
|
d39b8daf5f
|
[Feature] Add Qwen3-ForcedAligner support via token classification pooling (#35367)
Signed-off-by: haosdent <haosdent@gmail.com>
|
2026-03-29 00:27:52 +00:00 |
|
Walter Beller-Morales
|
fafca38adc
|
[BugFix][Frontend] apply task instruction as system prompt in cohere v2/embed (#38362)
Signed-off-by: walterbm <walter.beller.morales@gmail.com>
|
2026-03-28 18:30:54 +00:00 |
|
haosdent
|
b2bc736b12
|
[CI] Fix Ernie4.5-VL initialization test (#38429)
Signed-off-by: haosdent <haosdent@gmail.com>
|
2026-03-28 22:43:24 +08:00 |
|
Bvicii
|
bda3eda82d
|
[Bugfix] Disallow renderer_num_workers > 1 with mm processor cache (#38418)
Signed-off-by: Bvicii <yizhanhuang2002@gmail.com>
|
2026-03-28 06:32:52 -07:00 |
|
yzong-rh
|
6dad4c5722
|
[Test] Fix flaky race condition in test_abort_final_step (#38414)
Signed-off-by: Yifan <yzong@redhat.com>
|
2026-03-28 09:06:56 +00:00 |
|
Nicolò Lucchesi
|
44a6528028
|
[CI] Skip failing test (#38369)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2026-03-27 13:25:19 -07:00 |
|
Kyle Sayers
|
648edcf729
|
[QeRL] Compose online quantization with quantized reloading (#38032)
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
|
2026-03-27 13:22:33 -07:00 |
|
Jonas M. Kübler
|
98e7f223b9
|
enable skipping of SW attention layers when using FP8 KV cache (#33695)
Signed-off-by: Jonas Kuebler <kuebj@amazon.com>
|
2026-03-27 07:25:02 -06:00 |
|
Juan Pérez de Algaba
|
b111f8a61f
|
fix(security): Add VLLM_MAX_N_SEQUENCES environment variable and enforce limit (#37952)
Signed-off-by: jperezde <jperezde@redhat.com>
Signed-off-by: Russell Bryant <rbryant@redhat.com>
Co-authored-by: Russell Bryant <rbryant@redhat.com>
|
2026-03-27 09:02:10 -04:00 |
|
Sage Moore
|
497e234d38
|
[EPLB] Cleanup the transfer logic for the various eplb maps (#34520)
Signed-off-by: Sage Moore <sagmoore@redhat.com>
Signed-off-by: Sage Moore <sage@neuralmagic.com>
|
2026-03-27 10:18:46 +01:00 |
|
dtc
|
6287e7fa20
|
[P/D] Mooncake: Add unit tests and minor fixes for mooncake connector (#36946)
Signed-off-by: Tianchen Ding <dtcccc@linux.alibaba.com>
|
2026-03-27 09:26:40 +01:00 |
|
Flora Feng
|
aee4c14689
|
[Bugfix] Fix Hermes tool parser when stream interval > 1 (#38168)
Signed-off-by: sfeng33 <4florafeng@gmail.com>
|
2026-03-27 14:42:26 +08:00 |
|
Li, Jiang
|
becaed6ec8
|
[CPU] Support CT W4A16 on CPU MP kernel (#38219)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
|
2026-03-27 14:15:28 +08:00 |
|
Or Ozeri
|
7cc302dd87
|
[kv_offload+HMA][7/N]: Support register_kv_caches for hybrid models (#37853)
Signed-off-by: Or Ozeri <oro@il.ibm.com>
|
2026-03-27 08:38:33 +03:00 |
|
Bvicii
|
999dfc1622
|
[Bugfix] Offload blocking tokenizer ops to shared thread pool to unblock event loop (#34789)
Signed-off-by: Bvicii <yizhanhuang2002@gmail.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2026-03-26 22:17:00 -07:00 |
|
Giancarlo Delfin
|
c32e97602d
|
[Model Runner V2] Enable forcing a specific acceptance rate during rejection sampling (#38045)
Signed-off-by: Giancarlo Delfin <gdelfin@inferact.ai>
|
2026-03-26 13:38:12 -07:00 |
|
Andreas Karatzas
|
9c3ae04bfe
|
[ROCm][CI] Add LM Eval Qwen3.5 Models test for MI355 (#38155)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-26 16:51:18 +00:00 |
|
Divakar Verma
|
b9dbc5c4ab
|
[Mamba][APC] Add test case to compare apc outputs (#34977)
Signed-off-by: Divakar Verma <divakar.verma@amd.com>
|
2026-03-26 16:40:35 +00:00 |
|
Andreas Karatzas
|
bdc1719eb9
|
[ROCm][CI] Fix AITER state leak in shared_fused_moe_routed_transform test (#38137)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-26 09:26:46 -07:00 |
|
Zhewen Li
|
be1a85b7a2
|
Revert "[MoE Kernel] Flashinfer nvfp4 cutedsl moe kernel integration" (#38050) (#38169)
Co-authored-by: Zhewen Li <zhewenli@inferact.ai>
|
2026-03-26 07:59:09 -07:00 |
|
Cyrus Leung
|
2e225f7bd2
|
[Renderer] Consolidate factory methods (#38218)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-03-26 12:19:22 +00:00 |
|
wang.yuqi
|
dcdc145893
|
[CI] Reorganize scoring tests (#38207)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
|
2026-03-26 12:07:01 +00:00 |
|
Andreas Karatzas
|
f2d16207c7
|
[ROCm][CI] Fix flaky GPTQ compile correctness test (#38161)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-26 19:57:00 +08:00 |
|
Andreas Karatzas
|
37a83007fe
|
[ROCm][CI] Fix wvSplitKrc mock argument order in test_rocm_unquantized_gemm (#38167)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-26 19:54:59 +08:00 |
|
Wentao Ye
|
bf5eec638d
|
[Refactor] Remove unused utils (#38153)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2026-03-26 17:08:19 +08:00 |
|
Vadim Gimpelson
|
52069012fe
|
[Bugfix] Fix DeepGemm E8M0 accuracy degradation for Qwen3.5 FP8 on Blackwell (#38083)
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>
|
2026-03-26 01:21:47 -07:00 |
|
Matej Rojec
|
2908094567
|
Add /v1/chat/completions/batch endpoint for batched chat completions (#38011)
Signed-off-by: Matej Rojec <64556640+MatejRojec@users.noreply.github.com>
|
2026-03-26 12:13:33 +08:00 |
|
Woosuk Kwon
|
144030c84e
|
Relocate Encoder CUDA graph manager (#38116)
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Co-authored-by: Nick Hill <nickhill123@gmail.com>
|
2026-03-25 20:52:12 -07:00 |
|
Harry Mellor
|
3c3c084240
|
Various Transformers v5 fixes (#38127)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-03-26 00:10:08 +00:00 |
|
Ekagra Ranjan
|
7b54f60db0
|
[Cohere] Enable Cohere-Transcribe (#38120)
Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>
|
2026-03-25 16:13:51 -07:00 |
|
Guillaume Guy
|
70a2152830
|
[MultiModal] add support for numpy array embeddings (#38119)
Signed-off-by: guillaume_guy <guillaume.guy@airbnb.com>
Signed-off-by: Guillaume Guy <guillaume.c.guy@gmail.com>
Co-authored-by: guillaume_guy <guillaume.guy@airbnb.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2026-03-25 20:13:04 +00:00 |
|
Andreas Karatzas
|
7d6917bef5
|
[ROCm] Fix MoE kernel test failures on gfx950 (#37833)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
Signed-off-by: Matthew Wong <Matthew.Wong2@amd.com>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Co-authored-by: Matthew Wong <Matthew.Wong2@amd.com>
|
2026-03-25 13:46:40 -05:00 |
|
Nick Hill
|
72cad44d3c
|
[Frontend] Move APIServerProcessManager target server fn (#38115)
Signed-off-by: Nick Hill <nickhill123@gmail.com>
|
2026-03-25 18:14:41 +00:00 |
|
Cyrus Leung
|
ba2f0acc2d
|
[Misc] Reorganize inputs (#35182)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-03-25 10:22:54 -07:00 |
|
Yongye Zhu
|
678b3c99e8
|
[MoE Kernel] Flashinfer nvfp4 cutedsl moe kernel integration (#38050)
|
2026-03-25 10:16:40 -07:00 |
|
Richard Zou
|
6e37c46b35
|
[compile] Add some more startup tests for top models (#38046)
Signed-off-by: Richard Zou <zou3519@gmail.com>
|
2026-03-25 12:02:22 -04:00 |
|
Wentao Ye
|
1bf2ddd0ee
|
[Refactor] Rename WAITING_FOR_FSM to WAITING_FOR_STRUCTURED_OUTPUT_GRAMMAR (#38048)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2026-03-25 11:41:44 -04:00 |
|
Wentao Ye
|
d7e93e13fb
|
[Feature] EPLB Support for GPU Model Runner v2 (#37488)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>
Co-authored-by: Woosuk Kwon <woosuk@inferact.ai>
|
2026-03-25 08:16:39 -07:00 |
|
Andrii Skliar
|
cd7643015e
|
[Feature] Support per-draft-model MoE backend via --speculative-config (#37880)
Signed-off-by: Andrii Skliar <askliar@nvidia.com>
Signed-off-by: [Andrii Skliar] <askliar@nvidia.com>
Co-authored-by: Andrii Skliar <askliar@nvidia.com>
|
2026-03-25 14:31:52 +00:00 |
|
Harry Mellor
|
d215d1efca
|
[Mypy] Better fixes for the mypy issues in vllm/config (#37902)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-03-25 06:14:43 -07:00 |
|
Matthias Gehre
|
a889b7f584
|
[Bugfix] Pass drafter quant_config to ParallelLMHead in Eagle3 (#37280)
Signed-off-by: Matthias Gehre <matthias.gehre@amd.com>
|
2026-03-25 11:42:58 +00:00 |
|