biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Lucas Wilkinson	b4f64e5b02	Update FlashMLA (#32491 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2026-01-21 13:03:37 +08:00
Nick Hill	6f067b1fb7	[Cleanup] Remove unused `KVConnectorModelRunnerMixin` methods (#32077 ) Signed-off-by: Nick Hill <nickhill123@gmail.com>	2026-01-21 11:16:37 +08:00
Or Ozeri	7013e9ac8f	OffloadingConnector: Prevent redundant loads (#29087 ) Signed-off-by: Or Ozeri <oro@il.ibm.com>	2026-01-21 01:15:42 +00:00
Lucas Wilkinson	2261340806	[Misc] Remove pad_for_cudagraphs from config (#30143 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Co-authored-by: Matthew Bonanni <mbonanni@redhat.com>	2026-01-20 15:05:48 -05:00
Rahul Tuli	f0feb1cf81	Test: added acceptance length tests (#32030 ) Signed-off-by: rahul-tuli <rtuli@redhat.com>	2026-01-20 18:55:15 +00:00
Tomas Ruiz	4a5299c93f	feat: spec decode with draft models (#24322 ) Signed-off-by: Tomas Ruiz <tomas.ruiz.te@gmail.com>	2026-01-19 16:05:46 -05:00
Vadim Gimpelson	0727cc9ecf	[BUGFIX] Fix `test_mla_backends.py`. Scale MLA projection weights to prevent numerical instability (#32529 ) Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>	2026-01-19 13:49:29 -05:00
Andreas Karatzas	c0a350ca73	[ROCm][CI] Add ROCm attention backend support for EAGLE DP tests (#32363 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-01-19 09:57:54 +00:00
Michael Goin	1be5a73571	[UX] Use kv_offloading_backend=native by default (#32421 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2026-01-15 18:55:11 +00:00
Wentao Ye	b34474bf2c	[Feature] Support async scheduling + PP (#32359 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2026-01-15 12:06:23 -05:00
Chauncey	707b44cc28	[Refactor] [11/N] to simplify the mcp architecture (#32396 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>	2026-01-15 18:49:31 +08:00
Cyrus Leung	cbbae38f93	[2/N] Move cache factories to MM registry (#32382 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-01-15 01:02:30 -08:00
dtc	1e584823f8	[Bugfix] Strengthen the check of X-data-parallel-rank in Hybrid LB mode (#32314 ) Signed-off-by: Tianchen Ding <dtcccc@linux.alibaba.com>	2026-01-15 16:31:16 +08:00
Chauncey	4c1c501a7e	[Refactor] [10/N] to simplify the vLLM openai completion serving architecture (#32369 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>	2026-01-15 07:41:34 +00:00
Micah Williamson	773d7073ae	[ROCm][CI] Disable async scheduling on ROCm for test_structured_output[meta-llama/Meta-Llama-3.1-8B-Instruct-xgrammar-auto-speculative_config9] (#32355 ) Signed-off-by: Micah Williamson <micah.williamson@amd.com>	2026-01-15 04:53:43 +00:00
Ryan Rock	15422ed3f7	[CI/Build][Hardware][AMD] Fix v1/shutdown (#31997 ) Signed-off-by: Ryan Rock <ryan.rock@amd.com>	2026-01-15 04:01:42 +00:00
Lumosis	66652e8082	[BugFix] Assign page_size_padded when unifying kv cache spec. (#32283 ) Signed-off-by: Lihao Ran <imlihao.ran@gmail.com>	2026-01-14 20:10:01 +00:00
Matthew Bonanni	2263d44b68	[4/N][Attention] Move MLA common to model_executor (#32060 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2026-01-13 09:08:45 -08:00
Matthew Bonanni	98f60e5acb	[6/N][Attention] Move utils to more appropriate locations (#32215 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2026-01-13 05:38:52 -08:00
Chauncey	fefce49807	[Refactor] [6/N] to simplify the vLLM openai chat_completion serving architecture (#32240 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>	2026-01-13 13:01:39 +00:00
Andreas Karatzas	df7e12715f	[ROCm][CI] Fix engine core client tests for ROCm spawn multiprocessing (#32061 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-01-13 15:14:30 +08:00
Nicolò Lucchesi	f8bd8394e3	[NIXL][Bugfix] Failure logging overhaul + early metadata free on failure (#32031 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2026-01-12 20:38:49 +00:00
Or Ozeri	2be765b68a	[BugFix] scheduler: Fix ordering preserving of skipped requests (#32173 ) Signed-off-by: Or Ozeri <oro@il.ibm.com>	2026-01-12 18:39:38 +00:00
Matthew Bonanni	20228cb851	[3/N][Attention] Move AttentionMetadata-related code from utils.py to backend.py (#32054 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2026-01-12 09:13:56 -08:00
Nicolò Lucchesi	5b68107411	[Misc][PD] Fix `get_attn_backend` usage in transfer connectors (#31988 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2026-01-12 18:10:05 +01:00
Asaf Joseph Gardin	8fb2c135be	[Bugfix] Fix stale SSM state for new Mamba requests scheduled as decode (#32118 ) Signed-off-by: Josephasafg <ajgard7@gmail.com>	2026-01-12 17:02:38 +00:00
Or Ozeri	9cddbdba6d	OffloadingConnector: Add cpu_bytes_to_use configuration (#24498 ) Signed-off-by: Or Ozeri <oro@il.ibm.com>	2026-01-12 15:00:43 +00:00
Or Ozeri	4c16ba617f	[KVConnector] OffloadingConnector: Fix bug in handling of preemptions (#29870 ) Signed-off-by: Or Ozeri <oro@il.ibm.com>	2026-01-11 08:05:36 +00:00
Or Ozeri	2a4dbe24ea	[BugFix] Wait for compute before offloading KV to CPU (#31341 ) Signed-off-by: Or Ozeri <oro@il.ibm.com>	2026-01-10 22:25:08 +00:00
Or Ozeri	028599739d	[BugFix] scheduler: Fix resuming of preempted requests after async load (#31583 ) Signed-off-by: Or Ozeri <oro@il.ibm.com>	2026-01-10 12:39:25 -08:00
Matthew Bonanni	2612ba9285	[1/N][Attention] Restructure attention: move files (#31916 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2026-01-09 13:10:24 -08:00
Yifan Qiao	cd4a95e3aa	[Feat][Core] Support multiple KV cache groups in Hybrid KV Coordinator (#31707 ) Signed-off-by: Yifan Qiao <yifanqiao@berkeley.edu>	2026-01-09 10:53:20 -08:00
inkcherry	4505849b30	[ROCm][PD] add moriio kv connector. (#29304 ) Signed-off-by: inkcherry <mingzhi.liu@amd.com>	2026-01-09 14:01:57 +00:00
Andreas Karatzas	e02706d2d2	[ROCm][CI][V1] Fix `nixl_connector` test failure and achieve CUDA parity in `test_async_scheduling` (#32000 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-01-09 20:48:32 +08:00
Nick Hill	29ce48221c	[Cleanup] Remove obsolete spec decoding compatibility logic (#32003 ) Signed-off-by: Nick Hill <nickhill123@gmail.com>	2026-01-09 05:44:18 +00:00
zhrrr	8ff4a99566	[Async][Feat] support apply penalty or bad_words for async + spec (#30495 ) Signed-off-by: zhuhaoran <zhuhaoran.zhr@alibaba-inc.com> Signed-off-by: izhuhaoran <izhuhaoran@qq.com> Signed-off-by: Nick Hill <nickhill123@gmail.com> Co-authored-by: Nick Hill <nickhill123@gmail.com>	2026-01-09 02:31:50 +00:00
Nick Hill	11cec296dd	[BugFix] Add spec-decode-incompatible request param validation (#31982 ) Signed-off-by: Nick Hill <nickhill123@gmail.com>	2026-01-09 00:08:21 +00:00
Lucas Wilkinson	6cdf015c3c	[Misc] Fix `Current vLLM config is not set.` warnings, assert to avoid issues in the future (#31747 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2026-01-08 15:20:49 -08:00
Ryan Rock	8cbdc7eb94	[CI/Build] Enable test_kv_cache_events_dp for AMD (#31834 ) Signed-off-by: Ryan Rock <ryan.rock@amd.com>	2026-01-08 09:00:24 +00:00
Lumosis	b634e619bb	Decouple page_size_bytes calculation in AttentionSpec for TPU/RPA Compatibility. (#31635 ) Signed-off-by: Lihao Ran <imlihao.ran@gmail.com> Signed-off-by: Lumosis <30372757+Lumosis@users.noreply.github.com>	2026-01-08 09:00:07 +00:00
Andreas Karatzas	5f2a473ff3	[ROCm][CI] v1 cpu offloading attention backend fix (#31833 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-01-08 14:37:50 +08:00
Andreas Karatzas	087a138963	[ROCm][CI] Fix attention backend test flakiness from uninitialized KV cache memory (#31928 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-01-08 04:35:25 +00:00
Richard Zou	a79079feef	[BugFix] Fix flakiness in test_eagle_dp for PyTorch 2.10 (#31915 ) Signed-off-by: Richard Zou <zou3519@gmail.com>	2026-01-08 04:04:58 +00:00
Nick Hill	10ef65eded	[BugFix] Fix bad words with speculative decoding (#31908 ) Signed-off-by: Nick Hill <nickhill123@gmail.com>	2026-01-07 15:46:42 -05:00
Kfir Toledo	b89443b8d9	[KVConnector]: Enable Cross-layers KV cache layout for MultiConnector (#30761 ) Signed-off-by: Kfir Toledo <kfir.toledo@ibm.com>	2026-01-07 16:59:43 +00:00
weiyu	e7596371a4	[Refactor][TPU] Remove torch_xla path and use tpu-inference (#30808 ) Signed-off-by: Wei-Yu Lin <weiyulin@google.com> Signed-off-by: weiyu <62784299+weiyu0824@users.noreply.github.com>	2026-01-07 16:07:16 +08:00
Benjamin Chislett	f7008ce1c4	[Perf] Async Scheduling + Speculative Decoding + Structured Outputs (#29821 ) Signed-off-by: Benjamin Chislett <bchislett@nvidia.com> Signed-off-by: Nick Hill <nickhill123@gmail.com> Co-authored-by: Nick Hill <nickhill123@gmail.com>	2026-01-06 18:50:37 +00:00
Lucas Wilkinson	e0327c9db2	[Attention][1/n] Remove usage of deprecated `seq_lens_cpu` and `num_computed_tokens_cpu` CommonAttentionMetadata properties (#31773 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2026-01-06 04:05:17 -08:00
John Calderon	2f4e6548ef	[Bugfix] vLLM produces invalid UTF-8 tokens and “�” (#28874 ) Signed-off-by: John Calderon <jcalderon@nvidia.com> Co-authored-by: Benjamin Chislett <bchislett@nvidia.com>	2026-01-06 00:23:00 +00:00
Wentao Ye	af9a7ec255	[Bug] Revert torch warning fix (#31585 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2026-01-05 22:31:21 +00:00

1 2 3 4 5 ...

1011 Commits