Lucas Kabela
|
a8c6ee9b78
|
[Performance Improvement] Update batched_count_greater_than to handle batch size 1 without recompile (#38933)
Signed-off-by: Lucas Kabela <lucaskabela@meta.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
|
2026-04-09 23:51:31 +08:00 |
|
wang.yuqi
|
66c079ae83
|
[Frontend][4/n] Improve pooling entrypoints | pooling. (#39153)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
|
2026-04-09 10:09:45 +00:00 |
|
sihao_li
|
e80e633927
|
[XPU] Skip VLLM_BATCH_INVARIANT for XPU in EAGLE DP test (#39164)
Signed-off-by: sihao.li <sihao.li@intel.com>
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
|
2026-04-09 12:45:16 +08:00 |
|
Chendi.Xue
|
ef5a226819
|
[PD][HeteroArch]Fix accuracy issue with CPU_ATTN as Decoder and Flash_ATTN as prefiller (#38935)
Signed-off-by: Chendi Xue <chendi.xue@intel.com>
|
2026-04-09 11:19:07 +08:00 |
|
Wentao Ye
|
3352bf8b03
|
[CI Bug] Fix pre-commit issue in main (#39347)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2026-04-08 14:10:05 -07:00 |
|
triangleXIV
|
7c94ae16c6
|
[BugFix] --max-model-len=-1 causes over-limit requests to hang and starve the entire service (#39102)
Signed-off-by: triangle14 <y1019026570@gmail.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
|
2026-04-08 14:03:17 -07:00 |
|
Rishi Puri
|
ad05edfbca
|
tests/v1/e2e/spec_decode: assert async scheduling is used (#39206)
Signed-off-by: Rishi Puri <riship@nvidia.com>
Signed-off-by: Rishi Puri <puririshi98@berkeley.edu>
Signed-off-by: sfeng33 <4florafeng@gmail.com>
Co-authored-by: Benjamin Chislett <chislett.ben@gmail.com>
Co-authored-by: Flora Feng <4florafeng@gmail.com>
|
2026-04-08 20:30:03 +00:00 |
|
Wentao Ye
|
2018137242
|
[Feature] Batch invariant nvfp4 linear support (#39322)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2026-04-08 16:29:13 -04:00 |
|
Or Ozeri
|
512c5eb455
|
[kv_offload+HMA][5/N]: Track group block hashes and block IDs (#37109)
Signed-off-by: Or Ozeri <oro@il.ibm.com>
|
2026-04-08 19:50:28 +03:00 |
|
haosdent
|
8904fc4d19
|
[Bugfix] Fix V1 logprobs empty strings for multi-byte UTF-8 tokens when logprobs > 0 (#34875)
Signed-off-by: haosdent <haosdent@gmail.com>
|
2026-04-08 15:30:00 +00:00 |
|
Giancarlo Delfin
|
5daf62271d
|
[Model Runner V2] Fuse probabilistic rejection sample kernels (#38496)
Signed-off-by: Giancarlo Delfin <gdelfin@inferact.ai>
|
2026-04-07 17:37:37 -07:00 |
|
ibifrost
|
96b5004b71
|
[KVConnector] Support 3FS KVConnector (#37636)
Signed-off-by: wuchenxin <wuchenxin.wcx@alibaba-inc.com>
Signed-off-by: ibifrost <47308427+ibifrost@users.noreply.github.com>
Co-authored-by: Simon Mo <simon.mo@hey.com>
|
2026-04-07 15:46:00 +00:00 |
|
Ronen Schaffer
|
7c139ab23f
|
[KV Offload] Clean up ARC/LRU refactoring leftovers: group ARC tests and fix stale comment (#38217)
Signed-off-by: Ronen Schaffer <ronen.schaffer@ibm.com>
|
2026-04-07 15:14:45 +03:00 |
|
Andreas Karatzas
|
a435e3108d
|
[ROCm][CI] Fix test repo-root assumptions (#39053)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-04-07 13:36:21 +08:00 |
|
zhanqiuhu
|
bfdc0a3a99
|
[NIXL][Mamba][3/N] Heterogeneous TP: 3-read conv state transfer (#37635)
|
2026-04-06 19:07:02 +02:00 |
|
Walter Beller-Morales
|
e69a265135
|
[Feat][Core] safely abort requests when FSM fails to advance (#38663)
Signed-off-by: walterbm <walter.beller.morales@gmail.com>
|
2026-04-06 08:00:16 -07:00 |
|
Julien Denize
|
fef56c1855
|
[Mistral Grammar] Support Grammar Factory (#38150)
Signed-off-by: juliendenize <julien.denize@mistral.ai>
|
2026-04-06 10:28:51 -04:00 |
|
Greg Pereira
|
4dd49b06f8
|
[Bug] Fix Import paths for encoder_cudagraph modules (#38997)
Signed-off-by: greg pereira <grpereir@redhat.com>
Signed-off-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
|
2026-04-05 19:11:58 +00:00 |
|
Aaron Batilo
|
9a528260ef
|
[Bugfix][Spec Decode] Fix extract_hidden_states for VLM models (#38987)
Signed-off-by: Aaron Batilo <abatilo@coreweave.com>
|
2026-04-05 02:41:54 -07:00 |
|
Yusuf Mohammad
|
46f02e00f2
|
[Bugfix] Fix AWQ models batch invariance issues (#38670)
Signed-off-by: yusuf <yusuf@deeplearningmachine.mynet>
Signed-off-by: <>
Co-authored-by: yusuf <yusuf@deeplearningmachine.mynet>
|
2026-04-03 14:54:15 +00:00 |
|
wliao2
|
32e0c0bfa2
|
refactor hard coded device string in test files under tests/v1 and tests/lora (#37566)
Signed-off-by: Liao, Wei <wei.liao@intel.com>
|
2026-04-03 11:21:47 +08:00 |
|
zhanqiuhu
|
7b743ba953
|
[CI] Fix: pass string cache_dtype in test_register_kv_caches (#38836)
|
2026-04-02 19:42:09 +00:00 |
|
wang.yuqi
|
a9b4f07ba2
|
[Frontend] Re-enable running MaxSim on GPU (#38620)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
|
2026-04-03 00:03:13 +08:00 |
|
Chauncey
|
cbe7d18096
|
[Misc] Rename think_start_str/think_end_str to reasoning_start_str/reasoning_end_str (#38242)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2026-04-01 09:56:45 -07:00 |
|
yzong-rh
|
dc0428ebb8
|
[NIXL][BUG] Fix Triton heterogeneous TP (#37940)
Signed-off-by: Yifan <yzong@redhat.com>
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>
|
2026-04-01 17:23:15 +02:00 |
|
Lucas Wilkinson
|
eb47454987
|
[Bugfix][MLA] Add logits size budget to sparse indexer prefill chunking (#36178)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2026-04-01 00:15:53 -04:00 |
|
HarshRathva
|
17b72fd1c8
|
Fix priority preemption regression test in scheduler (#37051)
Signed-off-by: HarshRathva <harshrathvaai@gmail.com>
Co-authored-by: Or Ozeri <oro@il.ibm.com>
|
2026-04-01 06:36:12 +03:00 |
|
Yifan Qiao
|
91e4521f9f
|
[Feat][v1] Simple yet General CPU KV Cache Offloading (#37160)
Signed-off-by: Yifan Qiao <yifanqiao@berkeley.edu>
Signed-off-by: Yifan Qiao <yifanqiao@inferact.ai>
|
2026-03-31 17:58:37 -07:00 |
|
Wentao Ye
|
856589ed9a
|
[Refactor] Remove dead code in kv connector and model runner (#38383)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2026-03-31 17:05:23 -04:00 |
|
Matthew Bonanni
|
757068dc65
|
[Bugfix][Async] Fix async spec decoding with hybrid models (#38556)
Signed-off-by: SandishKumarHN <sandishkumarhn@gmail.com>
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Co-authored-by: SandishKumarHN <sandishkumarhn@gmail.com>
|
2026-03-31 11:08:54 -04:00 |
|
wliao2
|
4dfad17ed1
|
replace cuda_device_count_stateless() to current_platform.device_count() (#37841)
Signed-off-by: Liao, Wei <wei.liao@intel.com>
Signed-off-by: wliao2 <wei.liao@intel.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
|
2026-03-31 22:32:54 +08:00 |
|
Nicolò Lucchesi
|
7430389669
|
[Bugfix][CI] Skip flaky test_eagle test (#38566)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2026-03-31 09:42:37 -04:00 |
|
Matthew Bonanni
|
7d65463528
|
[WIP][CI][Bugfix] Fix test_run_eagle_dp (#38584)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
|
2026-03-31 12:30:25 +02:00 |
|
Benjamin Chislett
|
494636b29d
|
[Feat][Spec Decode] DFlash (#36847)
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>
|
2026-03-30 15:03:15 -04:00 |
|
Chendi.Xue
|
3b1dbaad4e
|
[HMA]Fix corner case when hybrid page_size can not be evenly divided issue (blk_size=64,tp=4) (#37467)
Signed-off-by: Chendi Xue <chendi.xue@intel.com>
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: Chendi.Xue <chendi.xue@intel.com>
Co-authored-by: Matthew Bonanni <mbonanni@redhat.com>
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>
|
2026-03-30 16:47:30 +00:00 |
|
Collin McCarthy
|
1031c84c36
|
Fix ambiguous num_blocks for hybrid attn mamba (#37236)
Signed-off-by: Collin McCarthy <cmccarthy@nvidia.com>
Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>
Co-authored-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>
|
2026-03-30 11:09:45 +00:00 |
|
Nicolò Lucchesi
|
cc06b4e86b
|
[Mamba][Bugfix] Raise on insufficient cache blocks instead of silently capping cudagraph sizes (#38270)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2026-03-30 09:41:50 +00:00 |
|
Andreas Karatzas
|
4f2ed5fddb
|
[ROCm][CI] Enable hybrid chunked prefill test (#38317)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-30 10:30:26 +08:00 |
|
Wentao Ye
|
995dea1354
|
[Perf] Remove redundant device copies for CPU-only pooling token IDs, 48.9% E2E throughput improvement (#38139)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2026-03-29 18:12:50 +00:00 |
|
yzong-rh
|
6dad4c5722
|
[Test] Fix flaky race condition in test_abort_final_step (#38414)
Signed-off-by: Yifan <yzong@redhat.com>
|
2026-03-28 09:06:56 +00:00 |
|
dtc
|
6287e7fa20
|
[P/D] Mooncake: Add unit tests and minor fixes for mooncake connector (#36946)
Signed-off-by: Tianchen Ding <dtcccc@linux.alibaba.com>
|
2026-03-27 09:26:40 +01:00 |
|
Or Ozeri
|
7cc302dd87
|
[kv_offload+HMA][7/N]: Support register_kv_caches for hybrid models (#37853)
Signed-off-by: Or Ozeri <oro@il.ibm.com>
|
2026-03-27 08:38:33 +03:00 |
|
Giancarlo Delfin
|
c32e97602d
|
[Model Runner V2] Enable forcing a specific acceptance rate during rejection sampling (#38045)
Signed-off-by: Giancarlo Delfin <gdelfin@inferact.ai>
|
2026-03-26 13:38:12 -07:00 |
|
Woosuk Kwon
|
144030c84e
|
Relocate Encoder CUDA graph manager (#38116)
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Co-authored-by: Nick Hill <nickhill123@gmail.com>
|
2026-03-25 20:52:12 -07:00 |
|
Cyrus Leung
|
ba2f0acc2d
|
[Misc] Reorganize inputs (#35182)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-03-25 10:22:54 -07:00 |
|
Wentao Ye
|
1bf2ddd0ee
|
[Refactor] Rename WAITING_FOR_FSM to WAITING_FOR_STRUCTURED_OUTPUT_GRAMMAR (#38048)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2026-03-25 11:41:44 -04:00 |
|
Wentao Ye
|
d7e93e13fb
|
[Feature] EPLB Support for GPU Model Runner v2 (#37488)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>
Co-authored-by: Woosuk Kwon <woosuk@inferact.ai>
|
2026-03-25 08:16:39 -07:00 |
|
Andrii Skliar
|
cd7643015e
|
[Feature] Support per-draft-model MoE backend via --speculative-config (#37880)
Signed-off-by: Andrii Skliar <askliar@nvidia.com>
Signed-off-by: [Andrii Skliar] <askliar@nvidia.com>
Co-authored-by: Andrii Skliar <askliar@nvidia.com>
|
2026-03-25 14:31:52 +00:00 |
|
Harry Mellor
|
d215d1efca
|
[Mypy] Better fixes for the mypy issues in vllm/config (#37902)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-03-25 06:14:43 -07:00 |
|
Gregory Shtrasberg
|
189ddefbfd
|
[ROCm] Attention selector reordering (#36702)
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
Signed-off-by: Micah Williamson <micah.williamson@amd.com>
Co-authored-by: Micah Williamson <micah.williamson@amd.com>
|
2026-03-25 17:42:56 +08:00 |
|