biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
dtc	6287e7fa20	[P/D] Mooncake: Add unit tests and minor fixes for mooncake connector (#36946 ) Signed-off-by: Tianchen Ding <dtcccc@linux.alibaba.com>	2026-03-27 09:26:40 +01:00
Or Ozeri	7cc302dd87	[kv_offload+HMA][7/N]: Support register_kv_caches for hybrid models (#37853 ) Signed-off-by: Or Ozeri <oro@il.ibm.com>	2026-03-27 08:38:33 +03:00
Giancarlo Delfin	c32e97602d	[Model Runner V2] Enable forcing a specific acceptance rate during rejection sampling (#38045 ) Signed-off-by: Giancarlo Delfin <gdelfin@inferact.ai>	2026-03-26 13:38:12 -07:00
Woosuk Kwon	144030c84e	Relocate Encoder CUDA graph manager (#38116 ) Signed-off-by: Woosuk Kwon <woosuk@inferact.ai> Signed-off-by: Nick Hill <nickhill123@gmail.com> Co-authored-by: Nick Hill <nickhill123@gmail.com>	2026-03-25 20:52:12 -07:00
Cyrus Leung	ba2f0acc2d	[Misc] Reorganize inputs (#35182 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-03-25 10:22:54 -07:00
Wentao Ye	1bf2ddd0ee	[Refactor] Rename `WAITING_FOR_FSM` to `WAITING_FOR_STRUCTURED_OUTPUT_GRAMMAR` (#38048 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2026-03-25 11:41:44 -04:00
Wentao Ye	d7e93e13fb	[Feature] EPLB Support for GPU Model Runner v2 (#37488 ) Signed-off-by: yewentao256 <zhyanwentao@126.com> Signed-off-by: Woosuk Kwon <woosuk@inferact.ai> Co-authored-by: Woosuk Kwon <woosuk@inferact.ai>	2026-03-25 08:16:39 -07:00
Andrii Skliar	cd7643015e	[Feature] Support per-draft-model MoE backend via `--speculative-config` (#37880 ) Signed-off-by: Andrii Skliar <askliar@nvidia.com> Signed-off-by: [Andrii Skliar] <askliar@nvidia.com> Co-authored-by: Andrii Skliar <askliar@nvidia.com>	2026-03-25 14:31:52 +00:00
Harry Mellor	d215d1efca	[Mypy] Better fixes for the `mypy` issues in `vllm/config` (#37902 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-25 06:14:43 -07:00
Gregory Shtrasberg	189ddefbfd	[ROCm] Attention selector reordering (#36702 ) Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com> Signed-off-by: Micah Williamson <micah.williamson@amd.com> Co-authored-by: Micah Williamson <micah.williamson@amd.com>	2026-03-25 17:42:56 +08:00
Sungjae Lee	4731884796	[Feature] limit thinking tokens (hard limit) (#20859 ) Signed-off-by: Sungjae Lee <33976427+llsj14@users.noreply.github.com> Signed-off-by: Sungjae Lee <sung-jae.lee@navercorp.com> Signed-off-by: Chauncey <chaunceyjiang@gmail.com> Co-authored-by: Chauncey <chaunceyjiang@gmail.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-24 09:53:07 -07:00
Ronen Schaffer	e3c6c10cad	[KV Offload] Refactor CPU offloading: pluggable CachePolicy, remove Backend abstraction, restructure into `cpu/` package (#37874 ) Signed-off-by: Ronen Schaffer <ronen.schaffer@ibm.com>	2026-03-24 07:02:51 +02:00
Wentao Ye	c59a132f96	[V0 Deprecation] Refactor kv cache from list to element (#37487 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2026-03-23 20:10:11 -07:00
Ranran	dc6908ac6a	[Bugfix] Register VLLM_BATCH_INVARIANT in envs.py to fix spurious unknown env var warning (#35007 ) Signed-off-by: Ranran <1012869439@qq.com> Signed-off-by: Ranran <hzz5361@psu.edu> Signed-off-by: ran <hzz5361@psu.edu> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>	2026-03-23 18:31:14 -04:00
Matthew Bonanni	fafe76b4af	[Async][Spec Decoding] Zero-bubble async scheduling + spec decoding (#32951 ) Signed-off-by: zhuhaoran <zhuhaoran.zhr@alibaba-inc.com> Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Co-authored-by: zhuhaoran <zhuhaoran.zhr@alibaba-inc.com> Co-authored-by: zhrrr <43847754+izhuhaoran@users.noreply.github.com> Co-authored-by: Lucas Wilkinson <lwilkins@redhat.com> Co-authored-by: Benjamin Chislett <chislett.ben@gmail.com>	2026-03-23 15:37:22 -04:00
Nicolò Lucchesi	1cbbcfe8a3	[CI][PD] Add Hybrid SSM integration tests to CI (#37657 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2026-03-23 23:58:19 +08:00
Baorun (Lauren) Mu	f85e479e66	[Feature] ViT Full CUDA Graph (#35963 ) Signed-off-by: Baorun Mu <bmu@nvidia.com>	2026-03-23 13:01:10 +08:00
Wentao Ye	eaf4978621	[Test] Only Run MLA model when user explicitly set for batch invariance (#37719 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2026-03-22 09:09:12 -04:00
Andreas Karatzas	66f927f205	[Bugfix] Fix pooling non-determinism from pinned prompt_lens aliasing (#37775 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-03-22 03:22:24 +00:00
Brandon Pelfrey	80b70884eb	Add tensor IPC transfer mechanism for multimodal data (#32104 ) Signed-off-by: Brandon Pelfrey <bpelfrey@nvidia.com> Signed-off-by: Brandon Pelfrey <brandonpelfrey@gmail.com> Signed-off-by: Nick Hill <nickhill123@gmail.com> Co-authored-by: Nick Hill <nickhill123@gmail.com>	2026-03-21 20:10:20 +00:00
Francesco Fusco	298e510848	[Hybrid] calling get_mamba_groups() once at MambaCopyBuffers.create() (#37318 ) Signed-off-by: Francesco Fusco <ffu@zurich.ibm.com>	2026-03-21 09:29:43 +00:00
Santino Ramos	85f671b8e1	[Model Runner V2] Support Streaming Inputs (#37028 ) Signed-off-by: Santino Ramos <elsantinoramos@gmail.com>	2026-03-20 20:42:25 +00:00
Lucas Wilkinson	e1d85e5c24	[Attention] Support distinguishing between short extends and decodes (#37303 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2026-03-20 10:49:36 -07:00
Flora Feng	b4c1aef21c	[Refactor] Relocate tests from tests/v1/entrypoints/ to tests/entrypoints/ (#37500 ) Signed-off-by: sfeng33 <4florafeng@gmail.com>	2026-03-20 02:50:34 -07:00
Flora Feng	9040151fe1	[V0 Deprecation] Deprecate --disable-frontend-multiprocessing (#37612 ) Signed-off-by: sfeng33 <4florafeng@gmail.com>	2026-03-20 11:31:43 +08:00
tianshu-Michael-yu	269bf46d99	fix: disambiguate multimodal prefix cache keys (#36708 ) Signed-off-by: tianshu.yu <tianshuyu.formal@gmail.com>	2026-03-20 10:33:20 +08:00
zhanqiuhu	d49f273144	[SSM/Mamba] Follow-up: N-1 prefill for P/D disaggregation (#37310 )	2026-03-19 08:22:00 +01:00
Thillai Chithambaram	828f862acb	[Bugfix] Expand quantization method support in perf metrics (#37231 ) Signed-off-by: Thillai Chithambaram <thillaichithambaram.a@gmail.com>	2026-03-18 23:54:19 +00:00
Andy Lo	577df69b26	[Bugfix] Fix KV scales inconsistency in fp8 MLA & FlashInfer kv_cache_dtype "auto" leading to gibberish (#37054 ) Signed-off-by: Andy Lo <andy@mistral.ai>	2026-03-18 23:07:29 +00:00
Wentao Ye	0d81a1fe61	[V0 Deprecation] Deprecate virtual engine (#37195 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2026-03-18 14:30:14 -07:00
Or Ozeri	5dd8df0701	[kv_offload+HMA][2/N]: Support multiple KV groups in GPULoadStoreSpec (#36642 ) Signed-off-by: Or Ozeri <oro@il.ibm.com>	2026-03-18 19:26:40 +02:00
Or Ozeri	525f2eeb0b	[kv_offload+HMA][6/N]: Split offloading_connector.py (#37405 ) Signed-off-by: Or Ozeri <oro@il.ibm.com>	2026-03-18 14:42:46 +01:00
Andy Lo	98b09ddc27	[NIXL][Bugfix] metrics & testing minor bug (#36051 ) Signed-off-by: Andy Lo <andy@mistral.ai>	2026-03-18 14:39:14 +01:00
Or Ozeri	fcf0687b27	[kv_offload+HMA][0/N]: Support block-level preemption handling (#34805 ) Signed-off-by: Or Ozeri <oro@il.ibm.com> Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>	2026-03-18 08:49:53 +02:00
liuzhenwei	86b7e3c95a	[XPU] skip unsupported ut and update test_nixl_connector (#37179 ) Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com> Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>	2026-03-18 13:32:59 +08:00
Andreas Karatzas	ce2ef42fd3	[CI] Stabilize test_cpu_offloading by waiting for async offload before cache reset (#37335 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-03-18 05:26:20 +00:00
gxd3	a0dd1995c7	[Hardware][TPU] Add supports_async_scheduling() method to Executor interface so that it can be extended for Executor implementations. (#36924 ) Signed-off-by: Guangxiang Du <gxd@google.com>	2026-03-18 12:53:28 +08:00
Yong Hoon Shin	de35c06c66	Make KV connector metadata build overridable via plugin (#37336 ) Signed-off-by: Yong Hoon Shin <yhshin@meta.com>	2026-03-17 21:29:06 +00:00
Benjamin Chislett	8a680463fa	[Bugfix] Fix NemotronH MTP + Chunked Prefill (#35447 )	2026-03-17 07:07:33 +01:00
Flora Feng	3e3d320c1b	[Refactor] Relocate responses API tests (#37241 ) Signed-off-by: sfeng33 <4florafeng@gmail.com>	2026-03-17 05:14:52 +00:00
Harry Huang	45f526d652	[BugFix] Correct max memory usage for multiple KV-cache groups (#36030 ) Signed-off-by: huanghaoyan.hhy <huanghaoyan.hhy@alibaba-inc.com>	2026-03-17 00:38:52 +00:00
Andreas Karatzas	4f9b14c21c	[CI] Stabilize multinode DP internal LB completion tests (#36356 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-03-16 15:40:23 -07:00
rasmith	2cc26c3a99	[CI][BugFix][MORI][AMD] Add transfer_id to kv transfer params for test (#37213 ) Signed-off-by: Randall Smith <Randall.Smith@amd.com>	2026-03-16 13:22:57 -07:00
Flora Feng	dfa8852db2	[Refactor] Consolidate GPT-OSS reasoning parser tests (#36915 ) Signed-off-by: sfeng33 <4florafeng@gmail.com> Signed-off-by: Flora Feng <4florafeng@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2026-03-16 15:53:07 -04:00
Nicolò Lucchesi	f5c081d432	[PD][Nixl] Add support for hybrid SSM-FA models (#36687 )	2026-03-16 19:58:06 +01:00
haosdent	ca1954d58c	[Bugfix] Disable cross-layer KV cache for MLA attention backends (#37090 ) Signed-off-by: haosdent <haosdent@gmail.com> Co-authored-by: Or Ozeri <oro@il.ibm.com>	2026-03-16 19:03:10 +02:00
Fynn Schmitt-Ulms	04bf5a35fa	[Spec Decode] Update extract_hidden_states to use deferred kv_connector clear (#37013 )	2026-03-16 14:53:45 +01:00
haosdent	116ed130f4	[Bugfix] Fix GDN attention crash with mixed decode/spec-decode batches (#34871 ) Signed-off-by: haosdent <haosdent@gmail.com>	2026-03-16 10:30:23 +01:00
Andreas Karatzas	a2956a0f8e	[ROCm][CI] Retrying in case of batch variance effects and reducing flakiness (#36442 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-03-16 16:08:51 +08:00
Andreas Karatzas	911355e216	[ROCm] Fix KV copy methods and auto-select attention backend for ROCm (#36845 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-03-16 16:07:27 +08:00

1 2 3 4 5 ...

1103 Commits