biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Or Ozeri	5dd8df0701	[kv_offload+HMA][2/N]: Support multiple KV groups in GPULoadStoreSpec (#36642 ) Signed-off-by: Or Ozeri <oro@il.ibm.com>	2026-03-18 19:26:40 +02:00
Or Ozeri	525f2eeb0b	[kv_offload+HMA][6/N]: Split offloading_connector.py (#37405 ) Signed-off-by: Or Ozeri <oro@il.ibm.com>	2026-03-18 14:42:46 +01:00
Andy Lo	98b09ddc27	[NIXL][Bugfix] metrics & testing minor bug (#36051 ) Signed-off-by: Andy Lo <andy@mistral.ai>	2026-03-18 14:39:14 +01:00
Or Ozeri	fcf0687b27	[kv_offload+HMA][0/N]: Support block-level preemption handling (#34805 ) Signed-off-by: Or Ozeri <oro@il.ibm.com> Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>	2026-03-18 08:49:53 +02:00
liuzhenwei	86b7e3c95a	[XPU] skip unsupported ut and update test_nixl_connector (#37179 ) Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com> Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>	2026-03-18 13:32:59 +08:00
Andreas Karatzas	ce2ef42fd3	[CI] Stabilize test_cpu_offloading by waiting for async offload before cache reset (#37335 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-03-18 05:26:20 +00:00
gxd3	a0dd1995c7	[Hardware][TPU] Add supports_async_scheduling() method to Executor interface so that it can be extended for Executor implementations. (#36924 ) Signed-off-by: Guangxiang Du <gxd@google.com>	2026-03-18 12:53:28 +08:00
Yong Hoon Shin	de35c06c66	Make KV connector metadata build overridable via plugin (#37336 ) Signed-off-by: Yong Hoon Shin <yhshin@meta.com>	2026-03-17 21:29:06 +00:00
Benjamin Chislett	8a680463fa	[Bugfix] Fix NemotronH MTP + Chunked Prefill (#35447 )	2026-03-17 07:07:33 +01:00
Flora Feng	3e3d320c1b	[Refactor] Relocate responses API tests (#37241 ) Signed-off-by: sfeng33 <4florafeng@gmail.com>	2026-03-17 05:14:52 +00:00
Harry Huang	45f526d652	[BugFix] Correct max memory usage for multiple KV-cache groups (#36030 ) Signed-off-by: huanghaoyan.hhy <huanghaoyan.hhy@alibaba-inc.com>	2026-03-17 00:38:52 +00:00
Andreas Karatzas	4f9b14c21c	[CI] Stabilize multinode DP internal LB completion tests (#36356 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-03-16 15:40:23 -07:00
rasmith	2cc26c3a99	[CI][BugFix][MORI][AMD] Add transfer_id to kv transfer params for test (#37213 ) Signed-off-by: Randall Smith <Randall.Smith@amd.com>	2026-03-16 13:22:57 -07:00
Flora Feng	dfa8852db2	[Refactor] Consolidate GPT-OSS reasoning parser tests (#36915 ) Signed-off-by: sfeng33 <4florafeng@gmail.com> Signed-off-by: Flora Feng <4florafeng@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2026-03-16 15:53:07 -04:00
Nicolò Lucchesi	f5c081d432	[PD][Nixl] Add support for hybrid SSM-FA models (#36687 )	2026-03-16 19:58:06 +01:00
haosdent	ca1954d58c	[Bugfix] Disable cross-layer KV cache for MLA attention backends (#37090 ) Signed-off-by: haosdent <haosdent@gmail.com> Co-authored-by: Or Ozeri <oro@il.ibm.com>	2026-03-16 19:03:10 +02:00
Fynn Schmitt-Ulms	04bf5a35fa	[Spec Decode] Update extract_hidden_states to use deferred kv_connector clear (#37013 )	2026-03-16 14:53:45 +01:00
haosdent	116ed130f4	[Bugfix] Fix GDN attention crash with mixed decode/spec-decode batches (#34871 ) Signed-off-by: haosdent <haosdent@gmail.com>	2026-03-16 10:30:23 +01:00
Andreas Karatzas	a2956a0f8e	[ROCm][CI] Retrying in case of batch variance effects and reducing flakiness (#36442 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-03-16 16:08:51 +08:00
Andreas Karatzas	911355e216	[ROCm] Fix KV copy methods and auto-select attention backend for ROCm (#36845 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-03-16 16:07:27 +08:00
Benjamin Chislett	8b346309a5	[Refactor] Consolidate SupportsEagle (#36063 ) Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>	2026-03-13 23:22:40 +00:00
Kevin H. Luu	f1816fb192	[CI] Split V1 e2e + engine (1 GPU) into separate jobs (#36945 ) Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-13 14:16:02 -07:00
Itay Alroy	d5af196c18	[2/N] Elastic EP Milestone 2: Integrating NIXL-EP (#35627 ) Signed-off-by: Itay Alroy <ialroy@nvidia.com> Co-authored-by: Yongji Wu <wuyongji317@gmail.com> Co-authored-by: Ron Tourgeman <rtourgeman@nvidia.com>	2026-03-13 09:25:33 -04:00
Or Ozeri	cfaf4668f7	[kv_offload+HMA][1/N]: Support multiple KV groups in OffloadingSpec (#36610 ) Signed-off-by: Or Ozeri <oro@il.ibm.com>	2026-03-13 08:04:21 +00:00
Sage	a2268617cf	[Frontend] Delegate preprocessing to `OpenAIServingRender` (#36483 ) Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>	2026-03-13 00:39:43 -07:00
Csrayz	bc2c0c86ef	[Frontend] Fix usage incorrectly returned with empty stream_options` (#36379 ) Signed-off-by: Csrayz <33659823+Csrayz@users.noreply.github.com>	2026-03-13 03:33:04 +00:00
Kunshang Ji	53ec16a705	[Hardware] Replace torch.cuda.device_count/current_device/set_device API (#36145 ) Signed-off-by: Kunshang Ji <jikunshang95@gmail.com> Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>	2026-03-12 07:57:47 -07:00
Martin Hickey	7f1f36bf91	[CI] Fix mypy for vllm/reasoning (#35742 ) Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-12 12:21:33 +00:00
sfeiqiang	8cb24d3aed	[KV Connector] Support using FlexKV as KV Cache Offloading option. (#34328 ) Signed-off-by: phaedonsun <phaedonsun@tencent.com> Co-authored-by: phaedonsun <phaedonsun@tencent.com>	2026-03-12 00:46:20 -07:00
Chauncey	9fe404ed04	[Frontend] OpenAI Responses API supports Tool/Function calling with streaming (#29947 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>	2026-03-12 15:03:50 +08:00
Wentao Ye	c34ba6b961	[Perf] Optimize compute maxsim using batched version, 3.2% E2E throughput improvement (#36710 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2026-03-12 08:37:01 +08:00
Or Ozeri	7ee5d5093b	[BugFix][kv_offload] Fix offloading decodes with async scheduling (#33881 ) Signed-off-by: Or Ozeri <oro@il.ibm.com> Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>	2026-03-11 20:43:40 +00:00
Or Ozeri	a1a3523a56	[KVConnector] Support worker -> scheduler metadata (#31964 ) Signed-off-by: Or Ozeri <oro@il.ibm.com> Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>	2026-03-11 17:36:37 +00:00
Nicolò Lucchesi	098d844731	[NIXL][1/N] Refactor `kernel_block_size` detection (#35752 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2026-03-11 01:11:23 -07:00
Sladyn	4aaaf8c8ce	feat(spec_decode): fuse EAGLE step slot mapping and metadata updates (#33503 ) Signed-off-by: sladynnunes <snunes@usc.edu>	2026-03-11 04:35:33 +00:00
Wentao Ye	a8ff2cca92	[Perf] Optimize scheduler overhead for PD disaggregation, around 5% E2E perf improvement (#35781 ) Signed-off-by: yewentao256 <zhyanwentao@126.com> Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com> Co-authored-by: Or Ozeri <oro@il.ibm.com>	2026-03-10 21:25:30 -07:00
Nick Hill	2a68464c5b	[Test] `test_async_scheduling.py` improvements (#36340 ) Signed-off-by: Nick Hill <nickhill123@gmail.com>	2026-03-10 11:17:26 -07:00
Srinivasoo7	106ff69c4e	feat(kv-offload): Strategy A — StoreReusedOffloadingManager gates CPU stores on reuse frequency (#35342 ) Signed-off-by: srinivas_oo7 <Sriusa4414@gmail.com> Signed-off-by: Sriusa4414@gmail.com Signed-off-by: Srinivasoo7 <158864704+Srinivasoo7@users.noreply.github.com> Co-authored-by: srinivas_oo7 <sklinkedin0120@gmail.com> Co-authored-by: Srinivasoo7 <158864704+Srinivasoo7@users.noreply.github.com> Co-authored-by: Or Ozeri <oro@il.ibm.com>	2026-03-10 14:43:40 +00:00
Wentao Ye	7279374f91	[Perf] Compute maxsim in worker side, reducing redundant copies, 2.7% E2E throughput improvement (#36159 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2026-03-09 20:55:58 -07:00
Micah Williamson	4ff9b045fe	[ROCm][CI] Prep Tests For Change To ROCM_ATTN As New Default Backend On ROCm (#36025 ) Signed-off-by: Micah Williamson <micah.williamson@amd.com>	2026-03-09 13:27:55 -05:00
Andreas Karatzas	c174d54f86	[ROCm][CI] Fix ROCm attention backend validation for head sizes, block sizes, and compute capability checks (#36292 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-03-09 12:02:41 -05:00
Matthew Bonanni	77a73458e3	Reapply [Attention] Refactor `check_and_update_config` (#35122 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2026-03-09 07:17:14 -07:00
liuzhenwei	1bc9c77f6d	[XPU] Add test script of PD disaggregation (#36434 ) Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com>	2026-03-09 05:50:27 +00:00
Wei Zhao	379689d533	[Perf] Support FP8 KV cache for Flashinfer MLA Sparse (#35891 )	2026-03-07 13:51:54 -08:00
PatchyTIS	a6be75dbd2	[Core] NGram GPU Implementation compatible with Async Scheduler (#29184 )	2026-03-07 13:51:37 -08:00
qli88	eebd14651f	[CI] Enable Crosslayer KV layout tests for ROCm platforms (#35416 )	2026-03-07 13:49:56 -08:00
lif	00b814ba5a	[V0 Deprecation] Remove unused swap_space parameter (#36216 ) Signed-off-by: majiayu000 <1835304752@qq.com> Co-authored-by: mcelrath	2026-03-07 22:09:55 +08:00
Harry Mellor	e2090bf3af	[CI] Fix startup error test (#36230 ) A change in engine startup error messages in #35478 caused this test failure. Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-06 11:50:28 +00:00
Nicolò Lucchesi	5b3ba94ab4	[Core][KVConnector] Support HMA+NixlConnector (#35758 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2026-03-06 08:51:21 +01:00
zhanqiuhu	90f3c01fa4	[Spec Decode][KV Connector] Fix KV transfer in PD + speculative decoding (#35158 ) Signed-off-by: Claude <noreply@anthropic.com> Signed-off-by: Zhanqiu Hu <zh338@cornell.edu> Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>	2026-03-06 08:50:44 +01:00

1 2 3 4 5 ...

1073 Commits