Lucas Wilkinson
|
b4f64e5b02
|
Update FlashMLA (#32491)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2026-01-21 13:03:37 +08:00 |
|
Nick Hill
|
6f067b1fb7
|
[Cleanup] Remove unused KVConnectorModelRunnerMixin methods (#32077)
Signed-off-by: Nick Hill <nickhill123@gmail.com>
|
2026-01-21 11:16:37 +08:00 |
|
Or Ozeri
|
7013e9ac8f
|
OffloadingConnector: Prevent redundant loads (#29087)
Signed-off-by: Or Ozeri <oro@il.ibm.com>
|
2026-01-21 01:15:42 +00:00 |
|
Lucas Wilkinson
|
2261340806
|
[Misc] Remove pad_for_cudagraphs from config (#30143)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Co-authored-by: Matthew Bonanni <mbonanni@redhat.com>
|
2026-01-20 15:05:48 -05:00 |
|
Rahul Tuli
|
f0feb1cf81
|
Test: added acceptance length tests (#32030)
Signed-off-by: rahul-tuli <rtuli@redhat.com>
|
2026-01-20 18:55:15 +00:00 |
|
Tomas Ruiz
|
4a5299c93f
|
feat: spec decode with draft models (#24322)
Signed-off-by: Tomas Ruiz <tomas.ruiz.te@gmail.com>
|
2026-01-19 16:05:46 -05:00 |
|
Vadim Gimpelson
|
0727cc9ecf
|
[BUGFIX] Fix test_mla_backends.py. Scale MLA projection weights to prevent numerical instability (#32529)
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>
|
2026-01-19 13:49:29 -05:00 |
|
Andreas Karatzas
|
c0a350ca73
|
[ROCm][CI] Add ROCm attention backend support for EAGLE DP tests (#32363)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-01-19 09:57:54 +00:00 |
|
Michael Goin
|
1be5a73571
|
[UX] Use kv_offloading_backend=native by default (#32421)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2026-01-15 18:55:11 +00:00 |
|
Wentao Ye
|
b34474bf2c
|
[Feature] Support async scheduling + PP (#32359)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2026-01-15 12:06:23 -05:00 |
|
Chauncey
|
707b44cc28
|
[Refactor] [11/N] to simplify the mcp architecture (#32396)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2026-01-15 18:49:31 +08:00 |
|
Cyrus Leung
|
cbbae38f93
|
[2/N] Move cache factories to MM registry (#32382)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-01-15 01:02:30 -08:00 |
|
dtc
|
1e584823f8
|
[Bugfix] Strengthen the check of X-data-parallel-rank in Hybrid LB mode (#32314)
Signed-off-by: Tianchen Ding <dtcccc@linux.alibaba.com>
|
2026-01-15 16:31:16 +08:00 |
|
Chauncey
|
4c1c501a7e
|
[Refactor] [10/N] to simplify the vLLM openai completion serving architecture (#32369)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2026-01-15 07:41:34 +00:00 |
|
Micah Williamson
|
773d7073ae
|
[ROCm][CI] Disable async scheduling on ROCm for test_structured_output[meta-llama/Meta-Llama-3.1-8B-Instruct-xgrammar-auto-speculative_config9] (#32355)
Signed-off-by: Micah Williamson <micah.williamson@amd.com>
|
2026-01-15 04:53:43 +00:00 |
|
Ryan Rock
|
15422ed3f7
|
[CI/Build][Hardware][AMD] Fix v1/shutdown (#31997)
Signed-off-by: Ryan Rock <ryan.rock@amd.com>
|
2026-01-15 04:01:42 +00:00 |
|
Lumosis
|
66652e8082
|
[BugFix] Assign page_size_padded when unifying kv cache spec. (#32283)
Signed-off-by: Lihao Ran <imlihao.ran@gmail.com>
|
2026-01-14 20:10:01 +00:00 |
|
Matthew Bonanni
|
2263d44b68
|
[4/N][Attention] Move MLA common to model_executor (#32060)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
|
2026-01-13 09:08:45 -08:00 |
|
Matthew Bonanni
|
98f60e5acb
|
[6/N][Attention] Move utils to more appropriate locations (#32215)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
|
2026-01-13 05:38:52 -08:00 |
|
Chauncey
|
fefce49807
|
[Refactor] [6/N] to simplify the vLLM openai chat_completion serving architecture (#32240)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2026-01-13 13:01:39 +00:00 |
|
Andreas Karatzas
|
df7e12715f
|
[ROCm][CI] Fix engine core client tests for ROCm spawn multiprocessing (#32061)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-01-13 15:14:30 +08:00 |
|
Nicolò Lucchesi
|
f8bd8394e3
|
[NIXL][Bugfix] Failure logging overhaul + early metadata free on failure (#32031)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2026-01-12 20:38:49 +00:00 |
|
Or Ozeri
|
2be765b68a
|
[BugFix] scheduler: Fix ordering preserving of skipped requests (#32173)
Signed-off-by: Or Ozeri <oro@il.ibm.com>
|
2026-01-12 18:39:38 +00:00 |
|
Matthew Bonanni
|
20228cb851
|
[3/N][Attention] Move AttentionMetadata-related code from utils.py to backend.py (#32054)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
|
2026-01-12 09:13:56 -08:00 |
|
Nicolò Lucchesi
|
5b68107411
|
[Misc][PD] Fix get_attn_backend usage in transfer connectors (#31988)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2026-01-12 18:10:05 +01:00 |
|
Asaf Joseph Gardin
|
8fb2c135be
|
[Bugfix] Fix stale SSM state for new Mamba requests scheduled as decode (#32118)
Signed-off-by: Josephasafg <ajgard7@gmail.com>
|
2026-01-12 17:02:38 +00:00 |
|
Or Ozeri
|
9cddbdba6d
|
OffloadingConnector: Add cpu_bytes_to_use configuration (#24498)
Signed-off-by: Or Ozeri <oro@il.ibm.com>
|
2026-01-12 15:00:43 +00:00 |
|
Or Ozeri
|
4c16ba617f
|
[KVConnector] OffloadingConnector: Fix bug in handling of preemptions (#29870)
Signed-off-by: Or Ozeri <oro@il.ibm.com>
|
2026-01-11 08:05:36 +00:00 |
|
Or Ozeri
|
2a4dbe24ea
|
[BugFix] Wait for compute before offloading KV to CPU (#31341)
Signed-off-by: Or Ozeri <oro@il.ibm.com>
|
2026-01-10 22:25:08 +00:00 |
|
Or Ozeri
|
028599739d
|
[BugFix] scheduler: Fix resuming of preempted requests after async load (#31583)
Signed-off-by: Or Ozeri <oro@il.ibm.com>
|
2026-01-10 12:39:25 -08:00 |
|
Matthew Bonanni
|
2612ba9285
|
[1/N][Attention] Restructure attention: move files (#31916)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
|
2026-01-09 13:10:24 -08:00 |
|
Yifan Qiao
|
cd4a95e3aa
|
[Feat][Core] Support multiple KV cache groups in Hybrid KV Coordinator (#31707)
Signed-off-by: Yifan Qiao <yifanqiao@berkeley.edu>
|
2026-01-09 10:53:20 -08:00 |
|
inkcherry
|
4505849b30
|
[ROCm][PD] add moriio kv connector. (#29304)
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
|
2026-01-09 14:01:57 +00:00 |
|
Andreas Karatzas
|
e02706d2d2
|
[ROCm][CI][V1] Fix nixl_connector test failure and achieve CUDA parity in test_async_scheduling (#32000)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-01-09 20:48:32 +08:00 |
|
Nick Hill
|
29ce48221c
|
[Cleanup] Remove obsolete spec decoding compatibility logic (#32003)
Signed-off-by: Nick Hill <nickhill123@gmail.com>
|
2026-01-09 05:44:18 +00:00 |
|
zhrrr
|
8ff4a99566
|
[Async][Feat] support apply penalty or bad_words for async + spec (#30495)
Signed-off-by: zhuhaoran <zhuhaoran.zhr@alibaba-inc.com>
Signed-off-by: izhuhaoran <izhuhaoran@qq.com>
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Co-authored-by: Nick Hill <nickhill123@gmail.com>
|
2026-01-09 02:31:50 +00:00 |
|
Nick Hill
|
11cec296dd
|
[BugFix] Add spec-decode-incompatible request param validation (#31982)
Signed-off-by: Nick Hill <nickhill123@gmail.com>
|
2026-01-09 00:08:21 +00:00 |
|
Lucas Wilkinson
|
6cdf015c3c
|
[Misc] Fix Current vLLM config is not set. warnings, assert to avoid issues in the future (#31747)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
|
2026-01-08 15:20:49 -08:00 |
|
Ryan Rock
|
8cbdc7eb94
|
[CI/Build] Enable test_kv_cache_events_dp for AMD (#31834)
Signed-off-by: Ryan Rock <ryan.rock@amd.com>
|
2026-01-08 09:00:24 +00:00 |
|
Lumosis
|
b634e619bb
|
Decouple page_size_bytes calculation in AttentionSpec for TPU/RPA Compatibility. (#31635)
Signed-off-by: Lihao Ran <imlihao.ran@gmail.com>
Signed-off-by: Lumosis <30372757+Lumosis@users.noreply.github.com>
|
2026-01-08 09:00:07 +00:00 |
|
Andreas Karatzas
|
5f2a473ff3
|
[ROCm][CI] v1 cpu offloading attention backend fix (#31833)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-01-08 14:37:50 +08:00 |
|
Andreas Karatzas
|
087a138963
|
[ROCm][CI] Fix attention backend test flakiness from uninitialized KV cache memory (#31928)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-01-08 04:35:25 +00:00 |
|
Richard Zou
|
a79079feef
|
[BugFix] Fix flakiness in test_eagle_dp for PyTorch 2.10 (#31915)
Signed-off-by: Richard Zou <zou3519@gmail.com>
|
2026-01-08 04:04:58 +00:00 |
|
Nick Hill
|
10ef65eded
|
[BugFix] Fix bad words with speculative decoding (#31908)
Signed-off-by: Nick Hill <nickhill123@gmail.com>
|
2026-01-07 15:46:42 -05:00 |
|
Kfir Toledo
|
b89443b8d9
|
[KVConnector]: Enable Cross-layers KV cache layout for MultiConnector (#30761)
Signed-off-by: Kfir Toledo <kfir.toledo@ibm.com>
|
2026-01-07 16:59:43 +00:00 |
|
weiyu
|
e7596371a4
|
[Refactor][TPU] Remove torch_xla path and use tpu-inference (#30808)
Signed-off-by: Wei-Yu Lin <weiyulin@google.com>
Signed-off-by: weiyu <62784299+weiyu0824@users.noreply.github.com>
|
2026-01-07 16:07:16 +08:00 |
|
Benjamin Chislett
|
f7008ce1c4
|
[Perf] Async Scheduling + Speculative Decoding + Structured Outputs (#29821)
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Co-authored-by: Nick Hill <nickhill123@gmail.com>
|
2026-01-06 18:50:37 +00:00 |
|
Lucas Wilkinson
|
e0327c9db2
|
[Attention][1/n] Remove usage of deprecated seq_lens_cpu and num_computed_tokens_cpu CommonAttentionMetadata properties (#31773)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2026-01-06 04:05:17 -08:00 |
|
John Calderon
|
2f4e6548ef
|
[Bugfix] vLLM produces invalid UTF-8 tokens and “�” (#28874)
Signed-off-by: John Calderon <jcalderon@nvidia.com>
Co-authored-by: Benjamin Chislett <bchislett@nvidia.com>
|
2026-01-06 00:23:00 +00:00 |
|
Wentao Ye
|
af9a7ec255
|
[Bug] Revert torch warning fix (#31585)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2026-01-05 22:31:21 +00:00 |
|