biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Harry Huang	45f526d652	[BugFix] Correct max memory usage for multiple KV-cache groups (#36030 ) Signed-off-by: huanghaoyan.hhy <huanghaoyan.hhy@alibaba-inc.com>	2026-03-17 00:38:52 +00:00
Andreas Karatzas	4f9b14c21c	[CI] Stabilize multinode DP internal LB completion tests (#36356 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-03-16 15:40:23 -07:00
rasmith	2cc26c3a99	[CI][BugFix][MORI][AMD] Add transfer_id to kv transfer params for test (#37213 ) Signed-off-by: Randall Smith <Randall.Smith@amd.com>	2026-03-16 13:22:57 -07:00
Flora Feng	dfa8852db2	[Refactor] Consolidate GPT-OSS reasoning parser tests (#36915 ) Signed-off-by: sfeng33 <4florafeng@gmail.com> Signed-off-by: Flora Feng <4florafeng@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2026-03-16 15:53:07 -04:00
Nicolò Lucchesi	f5c081d432	[PD][Nixl] Add support for hybrid SSM-FA models (#36687 )	2026-03-16 19:58:06 +01:00
haosdent	ca1954d58c	[Bugfix] Disable cross-layer KV cache for MLA attention backends (#37090 ) Signed-off-by: haosdent <haosdent@gmail.com> Co-authored-by: Or Ozeri <oro@il.ibm.com>	2026-03-16 19:03:10 +02:00
Fynn Schmitt-Ulms	04bf5a35fa	[Spec Decode] Update extract_hidden_states to use deferred kv_connector clear (#37013 )	2026-03-16 14:53:45 +01:00
haosdent	116ed130f4	[Bugfix] Fix GDN attention crash with mixed decode/spec-decode batches (#34871 ) Signed-off-by: haosdent <haosdent@gmail.com>	2026-03-16 10:30:23 +01:00
Andreas Karatzas	a2956a0f8e	[ROCm][CI] Retrying in case of batch variance effects and reducing flakiness (#36442 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-03-16 16:08:51 +08:00
Andreas Karatzas	911355e216	[ROCm] Fix KV copy methods and auto-select attention backend for ROCm (#36845 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-03-16 16:07:27 +08:00
Benjamin Chislett	8b346309a5	[Refactor] Consolidate SupportsEagle (#36063 ) Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>	2026-03-13 23:22:40 +00:00
Kevin H. Luu	f1816fb192	[CI] Split V1 e2e + engine (1 GPU) into separate jobs (#36945 ) Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-13 14:16:02 -07:00
Itay Alroy	d5af196c18	[2/N] Elastic EP Milestone 2: Integrating NIXL-EP (#35627 ) Signed-off-by: Itay Alroy <ialroy@nvidia.com> Co-authored-by: Yongji Wu <wuyongji317@gmail.com> Co-authored-by: Ron Tourgeman <rtourgeman@nvidia.com>	2026-03-13 09:25:33 -04:00
Or Ozeri	cfaf4668f7	[kv_offload+HMA][1/N]: Support multiple KV groups in OffloadingSpec (#36610 ) Signed-off-by: Or Ozeri <oro@il.ibm.com>	2026-03-13 08:04:21 +00:00
Sage	a2268617cf	[Frontend] Delegate preprocessing to `OpenAIServingRender` (#36483 ) Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>	2026-03-13 00:39:43 -07:00
Csrayz	bc2c0c86ef	[Frontend] Fix usage incorrectly returned with empty stream_options` (#36379 ) Signed-off-by: Csrayz <33659823+Csrayz@users.noreply.github.com>	2026-03-13 03:33:04 +00:00
Kunshang Ji	53ec16a705	[Hardware] Replace torch.cuda.device_count/current_device/set_device API (#36145 ) Signed-off-by: Kunshang Ji <jikunshang95@gmail.com> Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>	2026-03-12 07:57:47 -07:00
Martin Hickey	7f1f36bf91	[CI] Fix mypy for vllm/reasoning (#35742 ) Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-12 12:21:33 +00:00
sfeiqiang	8cb24d3aed	[KV Connector] Support using FlexKV as KV Cache Offloading option. (#34328 ) Signed-off-by: phaedonsun <phaedonsun@tencent.com> Co-authored-by: phaedonsun <phaedonsun@tencent.com>	2026-03-12 00:46:20 -07:00
Chauncey	9fe404ed04	[Frontend] OpenAI Responses API supports Tool/Function calling with streaming (#29947 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>	2026-03-12 15:03:50 +08:00
Wentao Ye	c34ba6b961	[Perf] Optimize compute maxsim using batched version, 3.2% E2E throughput improvement (#36710 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2026-03-12 08:37:01 +08:00
Or Ozeri	7ee5d5093b	[BugFix][kv_offload] Fix offloading decodes with async scheduling (#33881 ) Signed-off-by: Or Ozeri <oro@il.ibm.com> Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>	2026-03-11 20:43:40 +00:00
Or Ozeri	a1a3523a56	[KVConnector] Support worker -> scheduler metadata (#31964 ) Signed-off-by: Or Ozeri <oro@il.ibm.com> Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>	2026-03-11 17:36:37 +00:00
Nicolò Lucchesi	098d844731	[NIXL][1/N] Refactor `kernel_block_size` detection (#35752 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2026-03-11 01:11:23 -07:00
Sladyn	4aaaf8c8ce	feat(spec_decode): fuse EAGLE step slot mapping and metadata updates (#33503 ) Signed-off-by: sladynnunes <snunes@usc.edu>	2026-03-11 04:35:33 +00:00
Wentao Ye	a8ff2cca92	[Perf] Optimize scheduler overhead for PD disaggregation, around 5% E2E perf improvement (#35781 ) Signed-off-by: yewentao256 <zhyanwentao@126.com> Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com> Co-authored-by: Or Ozeri <oro@il.ibm.com>	2026-03-10 21:25:30 -07:00
Nick Hill	2a68464c5b	[Test] `test_async_scheduling.py` improvements (#36340 ) Signed-off-by: Nick Hill <nickhill123@gmail.com>	2026-03-10 11:17:26 -07:00
Srinivasoo7	106ff69c4e	feat(kv-offload): Strategy A — StoreReusedOffloadingManager gates CPU stores on reuse frequency (#35342 ) Signed-off-by: srinivas_oo7 <Sriusa4414@gmail.com> Signed-off-by: Sriusa4414@gmail.com Signed-off-by: Srinivasoo7 <158864704+Srinivasoo7@users.noreply.github.com> Co-authored-by: srinivas_oo7 <sklinkedin0120@gmail.com> Co-authored-by: Srinivasoo7 <158864704+Srinivasoo7@users.noreply.github.com> Co-authored-by: Or Ozeri <oro@il.ibm.com>	2026-03-10 14:43:40 +00:00
Wentao Ye	7279374f91	[Perf] Compute maxsim in worker side, reducing redundant copies, 2.7% E2E throughput improvement (#36159 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2026-03-09 20:55:58 -07:00
Micah Williamson	4ff9b045fe	[ROCm][CI] Prep Tests For Change To ROCM_ATTN As New Default Backend On ROCm (#36025 ) Signed-off-by: Micah Williamson <micah.williamson@amd.com>	2026-03-09 13:27:55 -05:00
Andreas Karatzas	c174d54f86	[ROCm][CI] Fix ROCm attention backend validation for head sizes, block sizes, and compute capability checks (#36292 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-03-09 12:02:41 -05:00
Matthew Bonanni	77a73458e3	Reapply [Attention] Refactor `check_and_update_config` (#35122 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2026-03-09 07:17:14 -07:00
liuzhenwei	1bc9c77f6d	[XPU] Add test script of PD disaggregation (#36434 ) Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com>	2026-03-09 05:50:27 +00:00
Wei Zhao	379689d533	[Perf] Support FP8 KV cache for Flashinfer MLA Sparse (#35891 )	2026-03-07 13:51:54 -08:00
PatchyTIS	a6be75dbd2	[Core] NGram GPU Implementation compatible with Async Scheduler (#29184 )	2026-03-07 13:51:37 -08:00
qli88	eebd14651f	[CI] Enable Crosslayer KV layout tests for ROCm platforms (#35416 )	2026-03-07 13:49:56 -08:00
lif	00b814ba5a	[V0 Deprecation] Remove unused swap_space parameter (#36216 ) Signed-off-by: majiayu000 <1835304752@qq.com> Co-authored-by: mcelrath	2026-03-07 22:09:55 +08:00
Harry Mellor	e2090bf3af	[CI] Fix startup error test (#36230 ) A change in engine startup error messages in #35478 caused this test failure. Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-06 11:50:28 +00:00
Nicolò Lucchesi	5b3ba94ab4	[Core][KVConnector] Support HMA+NixlConnector (#35758 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2026-03-06 08:51:21 +01:00
zhanqiuhu	90f3c01fa4	[Spec Decode][KV Connector] Fix KV transfer in PD + speculative decoding (#35158 ) Signed-off-by: Claude <noreply@anthropic.com> Signed-off-by: Zhanqiu Hu <zh338@cornell.edu> Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>	2026-03-06 08:50:44 +01:00
cong-or	57c84ff129	perf: add __slots__ to KVCacheBlock (#36164 ) Signed-off-by: cong-or <conchubhar.gannon@gmail.com>	2026-03-05 22:04:09 -08:00
Jiayi Yan	6a895197fa	[Bugfix][CI] fix typos (#34934 ) Signed-off-by: 1195343015 <1195343015@qq.com> Signed-off-by: Jiayi Yan <66017932+1195343015@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-05 17:05:46 +00:00
Ning Xie	176c799f4c	[openai api] log exception in exception handler (1/N) (#31164 ) Signed-off-by: Andy Xie <andy.xning@gmail.com>	2026-03-05 16:00:12 +00:00
Or Ozeri	612e7729c2	[KVConnector] Scheduler: Fix num_computed_tokens after async KV load (#34616 ) Signed-off-by: Or Ozeri <oro@il.ibm.com>	2026-03-05 14:25:15 +00:00
Benjamin Chislett	57c629e9c1	[Bugfix] Fix block_size for hybrid model MTP (#36036 ) Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>	2026-03-05 06:10:54 +00:00
Qi Wang	6aa6ad8992	[BugFix] Fix implicit and incorrect assumption on ECConnector is_producer (#34783 ) Signed-off-by: Qi Wang <qiwa@nvidia.com>	2026-03-04 15:01:30 +01:00
Ronen Schaffer	bb6888b8b1	[Bugfix][CPUOffloadingManager] Prevent eviction of already-stored blocks in LRU/ARC `prepare_store()` (#35846 ) Signed-off-by: Ronen Schaffer <ronen.schaffer@ibm.com>	2026-03-04 14:25:33 +02:00
haosdent	d6e04f4c43	[Bugfix] Cap FULL decode cudagraph sizes for Mamba/hybrid models (#34094 ) (#34571 ) Signed-off-by: haosdent <haosdent@gmail.com> Co-authored-by: zjy0516 <riverclouds.zhu@qq.com>	2026-03-04 11:56:22 +01:00
Kunshang Ji	16d2ad1d38	[Hardware] Replace `torch.cuda.empty_cache` with `torch.accelerator.empty_cache` (#30681 ) Signed-off-by: Kunshang Ji <kunshang.ji@intel.com> Signed-off-by: Kunshang Ji <jikunshang95@gmail.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-04 09:49:47 +00:00
ojhaanshika	e05cb3b93e	TRTLLM gen-full attn Test Coverage (#34986 ) Signed-off-by: Anshika Ojha <anshikao@nvidia.com> Co-authored-by: Anshika Ojha <anshikao@gb-nvl-059-compute09.nvidia.com>	2026-03-03 11:35:34 -05:00

1 2 3 4 5 ...

1063 Commits