Harry Huang
|
45f526d652
|
[BugFix] Correct max memory usage for multiple KV-cache groups (#36030)
Signed-off-by: huanghaoyan.hhy <huanghaoyan.hhy@alibaba-inc.com>
|
2026-03-17 00:38:52 +00:00 |
|
Andreas Karatzas
|
4f9b14c21c
|
[CI] Stabilize multinode DP internal LB completion tests (#36356)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-16 15:40:23 -07:00 |
|
rasmith
|
2cc26c3a99
|
[CI][BugFix][MORI][AMD] Add transfer_id to kv transfer params for test (#37213)
Signed-off-by: Randall Smith <Randall.Smith@amd.com>
|
2026-03-16 13:22:57 -07:00 |
|
Flora Feng
|
dfa8852db2
|
[Refactor] Consolidate GPT-OSS reasoning parser tests (#36915)
Signed-off-by: sfeng33 <4florafeng@gmail.com>
Signed-off-by: Flora Feng <4florafeng@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2026-03-16 15:53:07 -04:00 |
|
Nicolò Lucchesi
|
f5c081d432
|
[PD][Nixl] Add support for hybrid SSM-FA models (#36687)
|
2026-03-16 19:58:06 +01:00 |
|
haosdent
|
ca1954d58c
|
[Bugfix] Disable cross-layer KV cache for MLA attention backends (#37090)
Signed-off-by: haosdent <haosdent@gmail.com>
Co-authored-by: Or Ozeri <oro@il.ibm.com>
|
2026-03-16 19:03:10 +02:00 |
|
Fynn Schmitt-Ulms
|
04bf5a35fa
|
[Spec Decode] Update extract_hidden_states to use deferred kv_connector clear (#37013)
|
2026-03-16 14:53:45 +01:00 |
|
haosdent
|
116ed130f4
|
[Bugfix] Fix GDN attention crash with mixed decode/spec-decode batches (#34871)
Signed-off-by: haosdent <haosdent@gmail.com>
|
2026-03-16 10:30:23 +01:00 |
|
Andreas Karatzas
|
a2956a0f8e
|
[ROCm][CI] Retrying in case of batch variance effects and reducing flakiness (#36442)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-16 16:08:51 +08:00 |
|
Andreas Karatzas
|
911355e216
|
[ROCm] Fix KV copy methods and auto-select attention backend for ROCm (#36845)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-16 16:07:27 +08:00 |
|
Benjamin Chislett
|
8b346309a5
|
[Refactor] Consolidate SupportsEagle (#36063)
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>
|
2026-03-13 23:22:40 +00:00 |
|
Kevin H. Luu
|
f1816fb192
|
[CI] Split V1 e2e + engine (1 GPU) into separate jobs (#36945)
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
|
2026-03-13 14:16:02 -07:00 |
|
Itay Alroy
|
d5af196c18
|
[2/N] Elastic EP Milestone 2: Integrating NIXL-EP (#35627)
Signed-off-by: Itay Alroy <ialroy@nvidia.com>
Co-authored-by: Yongji Wu <wuyongji317@gmail.com>
Co-authored-by: Ron Tourgeman <rtourgeman@nvidia.com>
|
2026-03-13 09:25:33 -04:00 |
|
Or Ozeri
|
cfaf4668f7
|
[kv_offload+HMA][1/N]: Support multiple KV groups in OffloadingSpec (#36610)
Signed-off-by: Or Ozeri <oro@il.ibm.com>
|
2026-03-13 08:04:21 +00:00 |
|
Sage
|
a2268617cf
|
[Frontend] Delegate preprocessing to OpenAIServingRender (#36483)
Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>
|
2026-03-13 00:39:43 -07:00 |
|
Csrayz
|
bc2c0c86ef
|
[Frontend] Fix usage incorrectly returned with empty stream_options` (#36379)
Signed-off-by: Csrayz <33659823+Csrayz@users.noreply.github.com>
|
2026-03-13 03:33:04 +00:00 |
|
Kunshang Ji
|
53ec16a705
|
[Hardware] Replace torch.cuda.device_count/current_device/set_device API (#36145)
Signed-off-by: Kunshang Ji <jikunshang95@gmail.com>
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
|
2026-03-12 07:57:47 -07:00 |
|
Martin Hickey
|
7f1f36bf91
|
[CI] Fix mypy for vllm/reasoning (#35742)
Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-03-12 12:21:33 +00:00 |
|
sfeiqiang
|
8cb24d3aed
|
[KV Connector] Support using FlexKV as KV Cache Offloading option. (#34328)
Signed-off-by: phaedonsun <phaedonsun@tencent.com>
Co-authored-by: phaedonsun <phaedonsun@tencent.com>
|
2026-03-12 00:46:20 -07:00 |
|
Chauncey
|
9fe404ed04
|
[Frontend] OpenAI Responses API supports Tool/Function calling with streaming (#29947)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2026-03-12 15:03:50 +08:00 |
|
Wentao Ye
|
c34ba6b961
|
[Perf] Optimize compute maxsim using batched version, 3.2% E2E throughput improvement (#36710)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2026-03-12 08:37:01 +08:00 |
|
Or Ozeri
|
7ee5d5093b
|
[BugFix][kv_offload] Fix offloading decodes with async scheduling (#33881)
Signed-off-by: Or Ozeri <oro@il.ibm.com>
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>
|
2026-03-11 20:43:40 +00:00 |
|
Or Ozeri
|
a1a3523a56
|
[KVConnector] Support worker -> scheduler metadata (#31964)
Signed-off-by: Or Ozeri <oro@il.ibm.com>
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>
|
2026-03-11 17:36:37 +00:00 |
|
Nicolò Lucchesi
|
098d844731
|
[NIXL][1/N] Refactor kernel_block_size detection (#35752)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2026-03-11 01:11:23 -07:00 |
|
Sladyn
|
4aaaf8c8ce
|
feat(spec_decode): fuse EAGLE step slot mapping and metadata updates (#33503)
Signed-off-by: sladynnunes <snunes@usc.edu>
|
2026-03-11 04:35:33 +00:00 |
|
Wentao Ye
|
a8ff2cca92
|
[Perf] Optimize scheduler overhead for PD disaggregation, around 5% E2E perf improvement (#35781)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: Or Ozeri <oro@il.ibm.com>
|
2026-03-10 21:25:30 -07:00 |
|
Nick Hill
|
2a68464c5b
|
[Test] test_async_scheduling.py improvements (#36340)
Signed-off-by: Nick Hill <nickhill123@gmail.com>
|
2026-03-10 11:17:26 -07:00 |
|
Srinivasoo7
|
106ff69c4e
|
feat(kv-offload): Strategy A — StoreReusedOffloadingManager gates CPU stores on reuse frequency (#35342)
Signed-off-by: srinivas_oo7 <Sriusa4414@gmail.com>
Signed-off-by: Sriusa4414@gmail.com
Signed-off-by: Srinivasoo7 <158864704+Srinivasoo7@users.noreply.github.com>
Co-authored-by: srinivas_oo7 <sklinkedin0120@gmail.com>
Co-authored-by: Srinivasoo7 <158864704+Srinivasoo7@users.noreply.github.com>
Co-authored-by: Or Ozeri <oro@il.ibm.com>
|
2026-03-10 14:43:40 +00:00 |
|
Wentao Ye
|
7279374f91
|
[Perf] Compute maxsim in worker side, reducing redundant copies, 2.7% E2E throughput improvement (#36159)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2026-03-09 20:55:58 -07:00 |
|
Micah Williamson
|
4ff9b045fe
|
[ROCm][CI] Prep Tests For Change To ROCM_ATTN As New Default Backend On ROCm (#36025)
Signed-off-by: Micah Williamson <micah.williamson@amd.com>
|
2026-03-09 13:27:55 -05:00 |
|
Andreas Karatzas
|
c174d54f86
|
[ROCm][CI] Fix ROCm attention backend validation for head sizes, block sizes, and compute capability checks (#36292)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-09 12:02:41 -05:00 |
|
Matthew Bonanni
|
77a73458e3
|
Reapply [Attention] Refactor check_and_update_config (#35122)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
|
2026-03-09 07:17:14 -07:00 |
|
liuzhenwei
|
1bc9c77f6d
|
[XPU] Add test script of PD disaggregation (#36434)
Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com>
|
2026-03-09 05:50:27 +00:00 |
|
Wei Zhao
|
379689d533
|
[Perf] Support FP8 KV cache for Flashinfer MLA Sparse (#35891)
|
2026-03-07 13:51:54 -08:00 |
|
PatchyTIS
|
a6be75dbd2
|
[Core] NGram GPU Implementation compatible with Async Scheduler (#29184)
|
2026-03-07 13:51:37 -08:00 |
|
qli88
|
eebd14651f
|
[CI] Enable Crosslayer KV layout tests for ROCm platforms (#35416)
|
2026-03-07 13:49:56 -08:00 |
|
lif
|
00b814ba5a
|
[V0 Deprecation] Remove unused swap_space parameter (#36216)
Signed-off-by: majiayu000 <1835304752@qq.com>
Co-authored-by: mcelrath
|
2026-03-07 22:09:55 +08:00 |
|
Harry Mellor
|
e2090bf3af
|
[CI] Fix startup error test (#36230)
A change in engine startup error messages in #35478 caused this test failure.
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-03-06 11:50:28 +00:00 |
|
Nicolò Lucchesi
|
5b3ba94ab4
|
[Core][KVConnector] Support HMA+NixlConnector (#35758)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2026-03-06 08:51:21 +01:00 |
|
zhanqiuhu
|
90f3c01fa4
|
[Spec Decode][KV Connector] Fix KV transfer in PD + speculative decoding (#35158)
Signed-off-by: Claude <noreply@anthropic.com>
Signed-off-by: Zhanqiu Hu <zh338@cornell.edu>
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>
|
2026-03-06 08:50:44 +01:00 |
|
cong-or
|
57c84ff129
|
perf: add __slots__ to KVCacheBlock (#36164)
Signed-off-by: cong-or <conchubhar.gannon@gmail.com>
|
2026-03-05 22:04:09 -08:00 |
|
Jiayi Yan
|
6a895197fa
|
[Bugfix][CI] fix typos (#34934)
Signed-off-by: 1195343015 <1195343015@qq.com>
Signed-off-by: Jiayi Yan <66017932+1195343015@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-03-05 17:05:46 +00:00 |
|
Ning Xie
|
176c799f4c
|
[openai api] log exception in exception handler (1/N) (#31164)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
|
2026-03-05 16:00:12 +00:00 |
|
Or Ozeri
|
612e7729c2
|
[KVConnector] Scheduler: Fix num_computed_tokens after async KV load (#34616)
Signed-off-by: Or Ozeri <oro@il.ibm.com>
|
2026-03-05 14:25:15 +00:00 |
|
Benjamin Chislett
|
57c629e9c1
|
[Bugfix] Fix block_size for hybrid model MTP (#36036)
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>
|
2026-03-05 06:10:54 +00:00 |
|
Qi Wang
|
6aa6ad8992
|
[BugFix] Fix implicit and incorrect assumption on ECConnector is_producer (#34783)
Signed-off-by: Qi Wang <qiwa@nvidia.com>
|
2026-03-04 15:01:30 +01:00 |
|
Ronen Schaffer
|
bb6888b8b1
|
[Bugfix][CPUOffloadingManager] Prevent eviction of already-stored blocks in LRU/ARC prepare_store() (#35846)
Signed-off-by: Ronen Schaffer <ronen.schaffer@ibm.com>
|
2026-03-04 14:25:33 +02:00 |
|
haosdent
|
d6e04f4c43
|
[Bugfix] Cap FULL decode cudagraph sizes for Mamba/hybrid models (#34094) (#34571)
Signed-off-by: haosdent <haosdent@gmail.com>
Co-authored-by: zjy0516 <riverclouds.zhu@qq.com>
|
2026-03-04 11:56:22 +01:00 |
|
Kunshang Ji
|
16d2ad1d38
|
[Hardware] Replace torch.cuda.empty_cache with torch.accelerator.empty_cache (#30681)
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
Signed-off-by: Kunshang Ji <jikunshang95@gmail.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-03-04 09:49:47 +00:00 |
|
ojhaanshika
|
e05cb3b93e
|
TRTLLM gen-full attn Test Coverage (#34986)
Signed-off-by: Anshika Ojha <anshikao@nvidia.com>
Co-authored-by: Anshika Ojha <anshikao@gb-nvl-059-compute09.nvidia.com>
|
2026-03-03 11:35:34 -05:00 |
|