Or Ozeri
|
5dd8df0701
|
[kv_offload+HMA][2/N]: Support multiple KV groups in GPULoadStoreSpec (#36642)
Signed-off-by: Or Ozeri <oro@il.ibm.com>
|
2026-03-18 19:26:40 +02:00 |
|
Or Ozeri
|
525f2eeb0b
|
[kv_offload+HMA][6/N]: Split offloading_connector.py (#37405)
Signed-off-by: Or Ozeri <oro@il.ibm.com>
|
2026-03-18 14:42:46 +01:00 |
|
Andy Lo
|
98b09ddc27
|
[NIXL][Bugfix] metrics & testing minor bug (#36051)
Signed-off-by: Andy Lo <andy@mistral.ai>
|
2026-03-18 14:39:14 +01:00 |
|
Or Ozeri
|
fcf0687b27
|
[kv_offload+HMA][0/N]: Support block-level preemption handling (#34805)
Signed-off-by: Or Ozeri <oro@il.ibm.com>
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>
|
2026-03-18 08:49:53 +02:00 |
|
liuzhenwei
|
86b7e3c95a
|
[XPU] skip unsupported ut and update test_nixl_connector (#37179)
Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com>
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
|
2026-03-18 13:32:59 +08:00 |
|
Andreas Karatzas
|
ce2ef42fd3
|
[CI] Stabilize test_cpu_offloading by waiting for async offload before cache reset (#37335)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-18 05:26:20 +00:00 |
|
gxd3
|
a0dd1995c7
|
[Hardware][TPU] Add supports_async_scheduling() method to Executor interface so that it can be extended for Executor implementations. (#36924)
Signed-off-by: Guangxiang Du <gxd@google.com>
|
2026-03-18 12:53:28 +08:00 |
|
Yong Hoon Shin
|
de35c06c66
|
Make KV connector metadata build overridable via plugin (#37336)
Signed-off-by: Yong Hoon Shin <yhshin@meta.com>
|
2026-03-17 21:29:06 +00:00 |
|
Benjamin Chislett
|
8a680463fa
|
[Bugfix] Fix NemotronH MTP + Chunked Prefill (#35447)
|
2026-03-17 07:07:33 +01:00 |
|
Flora Feng
|
3e3d320c1b
|
[Refactor] Relocate responses API tests (#37241)
Signed-off-by: sfeng33 <4florafeng@gmail.com>
|
2026-03-17 05:14:52 +00:00 |
|
Harry Huang
|
45f526d652
|
[BugFix] Correct max memory usage for multiple KV-cache groups (#36030)
Signed-off-by: huanghaoyan.hhy <huanghaoyan.hhy@alibaba-inc.com>
|
2026-03-17 00:38:52 +00:00 |
|
Andreas Karatzas
|
4f9b14c21c
|
[CI] Stabilize multinode DP internal LB completion tests (#36356)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-16 15:40:23 -07:00 |
|
rasmith
|
2cc26c3a99
|
[CI][BugFix][MORI][AMD] Add transfer_id to kv transfer params for test (#37213)
Signed-off-by: Randall Smith <Randall.Smith@amd.com>
|
2026-03-16 13:22:57 -07:00 |
|
Flora Feng
|
dfa8852db2
|
[Refactor] Consolidate GPT-OSS reasoning parser tests (#36915)
Signed-off-by: sfeng33 <4florafeng@gmail.com>
Signed-off-by: Flora Feng <4florafeng@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2026-03-16 15:53:07 -04:00 |
|
Nicolò Lucchesi
|
f5c081d432
|
[PD][Nixl] Add support for hybrid SSM-FA models (#36687)
|
2026-03-16 19:58:06 +01:00 |
|
haosdent
|
ca1954d58c
|
[Bugfix] Disable cross-layer KV cache for MLA attention backends (#37090)
Signed-off-by: haosdent <haosdent@gmail.com>
Co-authored-by: Or Ozeri <oro@il.ibm.com>
|
2026-03-16 19:03:10 +02:00 |
|
Fynn Schmitt-Ulms
|
04bf5a35fa
|
[Spec Decode] Update extract_hidden_states to use deferred kv_connector clear (#37013)
|
2026-03-16 14:53:45 +01:00 |
|
haosdent
|
116ed130f4
|
[Bugfix] Fix GDN attention crash with mixed decode/spec-decode batches (#34871)
Signed-off-by: haosdent <haosdent@gmail.com>
|
2026-03-16 10:30:23 +01:00 |
|
Andreas Karatzas
|
a2956a0f8e
|
[ROCm][CI] Retrying in case of batch variance effects and reducing flakiness (#36442)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-16 16:08:51 +08:00 |
|
Andreas Karatzas
|
911355e216
|
[ROCm] Fix KV copy methods and auto-select attention backend for ROCm (#36845)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-16 16:07:27 +08:00 |
|
Benjamin Chislett
|
8b346309a5
|
[Refactor] Consolidate SupportsEagle (#36063)
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>
|
2026-03-13 23:22:40 +00:00 |
|
Kevin H. Luu
|
f1816fb192
|
[CI] Split V1 e2e + engine (1 GPU) into separate jobs (#36945)
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
|
2026-03-13 14:16:02 -07:00 |
|
Itay Alroy
|
d5af196c18
|
[2/N] Elastic EP Milestone 2: Integrating NIXL-EP (#35627)
Signed-off-by: Itay Alroy <ialroy@nvidia.com>
Co-authored-by: Yongji Wu <wuyongji317@gmail.com>
Co-authored-by: Ron Tourgeman <rtourgeman@nvidia.com>
|
2026-03-13 09:25:33 -04:00 |
|
Or Ozeri
|
cfaf4668f7
|
[kv_offload+HMA][1/N]: Support multiple KV groups in OffloadingSpec (#36610)
Signed-off-by: Or Ozeri <oro@il.ibm.com>
|
2026-03-13 08:04:21 +00:00 |
|
Sage
|
a2268617cf
|
[Frontend] Delegate preprocessing to OpenAIServingRender (#36483)
Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>
|
2026-03-13 00:39:43 -07:00 |
|
Csrayz
|
bc2c0c86ef
|
[Frontend] Fix usage incorrectly returned with empty stream_options` (#36379)
Signed-off-by: Csrayz <33659823+Csrayz@users.noreply.github.com>
|
2026-03-13 03:33:04 +00:00 |
|
Kunshang Ji
|
53ec16a705
|
[Hardware] Replace torch.cuda.device_count/current_device/set_device API (#36145)
Signed-off-by: Kunshang Ji <jikunshang95@gmail.com>
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
|
2026-03-12 07:57:47 -07:00 |
|
Martin Hickey
|
7f1f36bf91
|
[CI] Fix mypy for vllm/reasoning (#35742)
Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-03-12 12:21:33 +00:00 |
|
sfeiqiang
|
8cb24d3aed
|
[KV Connector] Support using FlexKV as KV Cache Offloading option. (#34328)
Signed-off-by: phaedonsun <phaedonsun@tencent.com>
Co-authored-by: phaedonsun <phaedonsun@tencent.com>
|
2026-03-12 00:46:20 -07:00 |
|
Chauncey
|
9fe404ed04
|
[Frontend] OpenAI Responses API supports Tool/Function calling with streaming (#29947)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2026-03-12 15:03:50 +08:00 |
|
Wentao Ye
|
c34ba6b961
|
[Perf] Optimize compute maxsim using batched version, 3.2% E2E throughput improvement (#36710)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2026-03-12 08:37:01 +08:00 |
|
Or Ozeri
|
7ee5d5093b
|
[BugFix][kv_offload] Fix offloading decodes with async scheduling (#33881)
Signed-off-by: Or Ozeri <oro@il.ibm.com>
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>
|
2026-03-11 20:43:40 +00:00 |
|
Or Ozeri
|
a1a3523a56
|
[KVConnector] Support worker -> scheduler metadata (#31964)
Signed-off-by: Or Ozeri <oro@il.ibm.com>
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>
|
2026-03-11 17:36:37 +00:00 |
|
Nicolò Lucchesi
|
098d844731
|
[NIXL][1/N] Refactor kernel_block_size detection (#35752)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2026-03-11 01:11:23 -07:00 |
|
Sladyn
|
4aaaf8c8ce
|
feat(spec_decode): fuse EAGLE step slot mapping and metadata updates (#33503)
Signed-off-by: sladynnunes <snunes@usc.edu>
|
2026-03-11 04:35:33 +00:00 |
|
Wentao Ye
|
a8ff2cca92
|
[Perf] Optimize scheduler overhead for PD disaggregation, around 5% E2E perf improvement (#35781)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: Or Ozeri <oro@il.ibm.com>
|
2026-03-10 21:25:30 -07:00 |
|
Nick Hill
|
2a68464c5b
|
[Test] test_async_scheduling.py improvements (#36340)
Signed-off-by: Nick Hill <nickhill123@gmail.com>
|
2026-03-10 11:17:26 -07:00 |
|
Srinivasoo7
|
106ff69c4e
|
feat(kv-offload): Strategy A — StoreReusedOffloadingManager gates CPU stores on reuse frequency (#35342)
Signed-off-by: srinivas_oo7 <Sriusa4414@gmail.com>
Signed-off-by: Sriusa4414@gmail.com
Signed-off-by: Srinivasoo7 <158864704+Srinivasoo7@users.noreply.github.com>
Co-authored-by: srinivas_oo7 <sklinkedin0120@gmail.com>
Co-authored-by: Srinivasoo7 <158864704+Srinivasoo7@users.noreply.github.com>
Co-authored-by: Or Ozeri <oro@il.ibm.com>
|
2026-03-10 14:43:40 +00:00 |
|
Wentao Ye
|
7279374f91
|
[Perf] Compute maxsim in worker side, reducing redundant copies, 2.7% E2E throughput improvement (#36159)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2026-03-09 20:55:58 -07:00 |
|
Micah Williamson
|
4ff9b045fe
|
[ROCm][CI] Prep Tests For Change To ROCM_ATTN As New Default Backend On ROCm (#36025)
Signed-off-by: Micah Williamson <micah.williamson@amd.com>
|
2026-03-09 13:27:55 -05:00 |
|
Andreas Karatzas
|
c174d54f86
|
[ROCm][CI] Fix ROCm attention backend validation for head sizes, block sizes, and compute capability checks (#36292)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-09 12:02:41 -05:00 |
|
Matthew Bonanni
|
77a73458e3
|
Reapply [Attention] Refactor check_and_update_config (#35122)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
|
2026-03-09 07:17:14 -07:00 |
|
liuzhenwei
|
1bc9c77f6d
|
[XPU] Add test script of PD disaggregation (#36434)
Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com>
|
2026-03-09 05:50:27 +00:00 |
|
Wei Zhao
|
379689d533
|
[Perf] Support FP8 KV cache for Flashinfer MLA Sparse (#35891)
|
2026-03-07 13:51:54 -08:00 |
|
PatchyTIS
|
a6be75dbd2
|
[Core] NGram GPU Implementation compatible with Async Scheduler (#29184)
|
2026-03-07 13:51:37 -08:00 |
|
qli88
|
eebd14651f
|
[CI] Enable Crosslayer KV layout tests for ROCm platforms (#35416)
|
2026-03-07 13:49:56 -08:00 |
|
lif
|
00b814ba5a
|
[V0 Deprecation] Remove unused swap_space parameter (#36216)
Signed-off-by: majiayu000 <1835304752@qq.com>
Co-authored-by: mcelrath
|
2026-03-07 22:09:55 +08:00 |
|
Harry Mellor
|
e2090bf3af
|
[CI] Fix startup error test (#36230)
A change in engine startup error messages in #35478 caused this test failure.
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-03-06 11:50:28 +00:00 |
|
Nicolò Lucchesi
|
5b3ba94ab4
|
[Core][KVConnector] Support HMA+NixlConnector (#35758)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2026-03-06 08:51:21 +01:00 |
|
zhanqiuhu
|
90f3c01fa4
|
[Spec Decode][KV Connector] Fix KV transfer in PD + speculative decoding (#35158)
Signed-off-by: Claude <noreply@anthropic.com>
Signed-off-by: Zhanqiu Hu <zh338@cornell.edu>
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>
|
2026-03-06 08:50:44 +01:00 |
|