biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Francesco Fusco	298e510848	[Hybrid] calling get_mamba_groups() once at MambaCopyBuffers.create() (#37318 ) Signed-off-by: Francesco Fusco <ffu@zurich.ibm.com>	2026-03-21 09:29:43 +00:00
Santino Ramos	85f671b8e1	[Model Runner V2] Support Streaming Inputs (#37028 ) Signed-off-by: Santino Ramos <elsantinoramos@gmail.com>	2026-03-20 20:42:25 +00:00
Lucas Wilkinson	e1d85e5c24	[Attention] Support distinguishing between short extends and decodes (#37303 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2026-03-20 10:49:36 -07:00
Flora Feng	b4c1aef21c	[Refactor] Relocate tests from tests/v1/entrypoints/ to tests/entrypoints/ (#37500 ) Signed-off-by: sfeng33 <4florafeng@gmail.com>	2026-03-20 02:50:34 -07:00
Flora Feng	9040151fe1	[V0 Deprecation] Deprecate --disable-frontend-multiprocessing (#37612 ) Signed-off-by: sfeng33 <4florafeng@gmail.com>	2026-03-20 11:31:43 +08:00
tianshu-Michael-yu	269bf46d99	fix: disambiguate multimodal prefix cache keys (#36708 ) Signed-off-by: tianshu.yu <tianshuyu.formal@gmail.com>	2026-03-20 10:33:20 +08:00
zhanqiuhu	d49f273144	[SSM/Mamba] Follow-up: N-1 prefill for P/D disaggregation (#37310 )	2026-03-19 08:22:00 +01:00
Thillai Chithambaram	828f862acb	[Bugfix] Expand quantization method support in perf metrics (#37231 ) Signed-off-by: Thillai Chithambaram <thillaichithambaram.a@gmail.com>	2026-03-18 23:54:19 +00:00
Andy Lo	577df69b26	[Bugfix] Fix KV scales inconsistency in fp8 MLA & FlashInfer kv_cache_dtype "auto" leading to gibberish (#37054 ) Signed-off-by: Andy Lo <andy@mistral.ai>	2026-03-18 23:07:29 +00:00
Wentao Ye	0d81a1fe61	[V0 Deprecation] Deprecate virtual engine (#37195 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2026-03-18 14:30:14 -07:00
Or Ozeri	5dd8df0701	[kv_offload+HMA][2/N]: Support multiple KV groups in GPULoadStoreSpec (#36642 ) Signed-off-by: Or Ozeri <oro@il.ibm.com>	2026-03-18 19:26:40 +02:00
Or Ozeri	525f2eeb0b	[kv_offload+HMA][6/N]: Split offloading_connector.py (#37405 ) Signed-off-by: Or Ozeri <oro@il.ibm.com>	2026-03-18 14:42:46 +01:00
Andy Lo	98b09ddc27	[NIXL][Bugfix] metrics & testing minor bug (#36051 ) Signed-off-by: Andy Lo <andy@mistral.ai>	2026-03-18 14:39:14 +01:00
Or Ozeri	fcf0687b27	[kv_offload+HMA][0/N]: Support block-level preemption handling (#34805 ) Signed-off-by: Or Ozeri <oro@il.ibm.com> Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>	2026-03-18 08:49:53 +02:00
liuzhenwei	86b7e3c95a	[XPU] skip unsupported ut and update test_nixl_connector (#37179 ) Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com> Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>	2026-03-18 13:32:59 +08:00
Andreas Karatzas	ce2ef42fd3	[CI] Stabilize test_cpu_offloading by waiting for async offload before cache reset (#37335 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-03-18 05:26:20 +00:00
gxd3	a0dd1995c7	[Hardware][TPU] Add supports_async_scheduling() method to Executor interface so that it can be extended for Executor implementations. (#36924 ) Signed-off-by: Guangxiang Du <gxd@google.com>	2026-03-18 12:53:28 +08:00
Yong Hoon Shin	de35c06c66	Make KV connector metadata build overridable via plugin (#37336 ) Signed-off-by: Yong Hoon Shin <yhshin@meta.com>	2026-03-17 21:29:06 +00:00
Benjamin Chislett	8a680463fa	[Bugfix] Fix NemotronH MTP + Chunked Prefill (#35447 )	2026-03-17 07:07:33 +01:00
Flora Feng	3e3d320c1b	[Refactor] Relocate responses API tests (#37241 ) Signed-off-by: sfeng33 <4florafeng@gmail.com>	2026-03-17 05:14:52 +00:00
Harry Huang	45f526d652	[BugFix] Correct max memory usage for multiple KV-cache groups (#36030 ) Signed-off-by: huanghaoyan.hhy <huanghaoyan.hhy@alibaba-inc.com>	2026-03-17 00:38:52 +00:00
Andreas Karatzas	4f9b14c21c	[CI] Stabilize multinode DP internal LB completion tests (#36356 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-03-16 15:40:23 -07:00
rasmith	2cc26c3a99	[CI][BugFix][MORI][AMD] Add transfer_id to kv transfer params for test (#37213 ) Signed-off-by: Randall Smith <Randall.Smith@amd.com>	2026-03-16 13:22:57 -07:00
Flora Feng	dfa8852db2	[Refactor] Consolidate GPT-OSS reasoning parser tests (#36915 ) Signed-off-by: sfeng33 <4florafeng@gmail.com> Signed-off-by: Flora Feng <4florafeng@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2026-03-16 15:53:07 -04:00
Nicolò Lucchesi	f5c081d432	[PD][Nixl] Add support for hybrid SSM-FA models (#36687 )	2026-03-16 19:58:06 +01:00
haosdent	ca1954d58c	[Bugfix] Disable cross-layer KV cache for MLA attention backends (#37090 ) Signed-off-by: haosdent <haosdent@gmail.com> Co-authored-by: Or Ozeri <oro@il.ibm.com>	2026-03-16 19:03:10 +02:00
Fynn Schmitt-Ulms	04bf5a35fa	[Spec Decode] Update extract_hidden_states to use deferred kv_connector clear (#37013 )	2026-03-16 14:53:45 +01:00
haosdent	116ed130f4	[Bugfix] Fix GDN attention crash with mixed decode/spec-decode batches (#34871 ) Signed-off-by: haosdent <haosdent@gmail.com>	2026-03-16 10:30:23 +01:00
Andreas Karatzas	a2956a0f8e	[ROCm][CI] Retrying in case of batch variance effects and reducing flakiness (#36442 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-03-16 16:08:51 +08:00
Andreas Karatzas	911355e216	[ROCm] Fix KV copy methods and auto-select attention backend for ROCm (#36845 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-03-16 16:07:27 +08:00
Benjamin Chislett	8b346309a5	[Refactor] Consolidate SupportsEagle (#36063 ) Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>	2026-03-13 23:22:40 +00:00
Kevin H. Luu	f1816fb192	[CI] Split V1 e2e + engine (1 GPU) into separate jobs (#36945 ) Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-13 14:16:02 -07:00
Itay Alroy	d5af196c18	[2/N] Elastic EP Milestone 2: Integrating NIXL-EP (#35627 ) Signed-off-by: Itay Alroy <ialroy@nvidia.com> Co-authored-by: Yongji Wu <wuyongji317@gmail.com> Co-authored-by: Ron Tourgeman <rtourgeman@nvidia.com>	2026-03-13 09:25:33 -04:00
Or Ozeri	cfaf4668f7	[kv_offload+HMA][1/N]: Support multiple KV groups in OffloadingSpec (#36610 ) Signed-off-by: Or Ozeri <oro@il.ibm.com>	2026-03-13 08:04:21 +00:00
Sage	a2268617cf	[Frontend] Delegate preprocessing to `OpenAIServingRender` (#36483 ) Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>	2026-03-13 00:39:43 -07:00
Csrayz	bc2c0c86ef	[Frontend] Fix usage incorrectly returned with empty stream_options` (#36379 ) Signed-off-by: Csrayz <33659823+Csrayz@users.noreply.github.com>	2026-03-13 03:33:04 +00:00
Kunshang Ji	53ec16a705	[Hardware] Replace torch.cuda.device_count/current_device/set_device API (#36145 ) Signed-off-by: Kunshang Ji <jikunshang95@gmail.com> Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>	2026-03-12 07:57:47 -07:00
Martin Hickey	7f1f36bf91	[CI] Fix mypy for vllm/reasoning (#35742 ) Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-12 12:21:33 +00:00
sfeiqiang	8cb24d3aed	[KV Connector] Support using FlexKV as KV Cache Offloading option. (#34328 ) Signed-off-by: phaedonsun <phaedonsun@tencent.com> Co-authored-by: phaedonsun <phaedonsun@tencent.com>	2026-03-12 00:46:20 -07:00
Chauncey	9fe404ed04	[Frontend] OpenAI Responses API supports Tool/Function calling with streaming (#29947 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>	2026-03-12 15:03:50 +08:00
Wentao Ye	c34ba6b961	[Perf] Optimize compute maxsim using batched version, 3.2% E2E throughput improvement (#36710 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2026-03-12 08:37:01 +08:00
Or Ozeri	7ee5d5093b	[BugFix][kv_offload] Fix offloading decodes with async scheduling (#33881 ) Signed-off-by: Or Ozeri <oro@il.ibm.com> Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>	2026-03-11 20:43:40 +00:00
Or Ozeri	a1a3523a56	[KVConnector] Support worker -> scheduler metadata (#31964 ) Signed-off-by: Or Ozeri <oro@il.ibm.com> Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>	2026-03-11 17:36:37 +00:00
Nicolò Lucchesi	098d844731	[NIXL][1/N] Refactor `kernel_block_size` detection (#35752 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2026-03-11 01:11:23 -07:00
Sladyn	4aaaf8c8ce	feat(spec_decode): fuse EAGLE step slot mapping and metadata updates (#33503 ) Signed-off-by: sladynnunes <snunes@usc.edu>	2026-03-11 04:35:33 +00:00
Wentao Ye	a8ff2cca92	[Perf] Optimize scheduler overhead for PD disaggregation, around 5% E2E perf improvement (#35781 ) Signed-off-by: yewentao256 <zhyanwentao@126.com> Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com> Co-authored-by: Or Ozeri <oro@il.ibm.com>	2026-03-10 21:25:30 -07:00
Nick Hill	2a68464c5b	[Test] `test_async_scheduling.py` improvements (#36340 ) Signed-off-by: Nick Hill <nickhill123@gmail.com>	2026-03-10 11:17:26 -07:00
Srinivasoo7	106ff69c4e	feat(kv-offload): Strategy A — StoreReusedOffloadingManager gates CPU stores on reuse frequency (#35342 ) Signed-off-by: srinivas_oo7 <Sriusa4414@gmail.com> Signed-off-by: Sriusa4414@gmail.com Signed-off-by: Srinivasoo7 <158864704+Srinivasoo7@users.noreply.github.com> Co-authored-by: srinivas_oo7 <sklinkedin0120@gmail.com> Co-authored-by: Srinivasoo7 <158864704+Srinivasoo7@users.noreply.github.com> Co-authored-by: Or Ozeri <oro@il.ibm.com>	2026-03-10 14:43:40 +00:00
Wentao Ye	7279374f91	[Perf] Compute maxsim in worker side, reducing redundant copies, 2.7% E2E throughput improvement (#36159 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2026-03-09 20:55:58 -07:00
Micah Williamson	4ff9b045fe	[ROCm][CI] Prep Tests For Change To ROCM_ATTN As New Default Backend On ROCm (#36025 ) Signed-off-by: Micah Williamson <micah.williamson@amd.com>	2026-03-09 13:27:55 -05:00

1 2 3 4 5 ...

1083 Commits