Yifan Qiao
|
f0d005864a
|
[Fix] prefix cache hit rate == 0 bug with gpt-oss style models (#33524)
Signed-off-by: Yifan Qiao <yifanqiao@berkeley.edu>
(cherry picked from commit a01ef3fa51)
|
2026-02-02 10:31:50 -08:00 |
|
Or Ozeri
|
fe18ce4d3f
|
Revert "Enable Cross layers KV cache layout at NIXL Connector (#30207)" (#33241)
Signed-off-by: Or Ozeri <oro@il.ibm.com>
Co-authored-by: Kevin H. Luu <khluu000@gmail.com>
(cherry picked from commit 2e8de86777)
|
2026-01-28 11:44:59 -08:00 |
|
Cyrus Leung
|
11b556878b
|
[Refactor] Use data parser for matching data items to multi-modal UUIDs (#32955)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-01-26 15:00:28 +08:00 |
|
Joshua Deng
|
91601ff478
|
[Feature] add session based streaming input support to v1 (#28973)
Signed-off-by: Joshua Deng <joshuakdeng@gmail.com>
Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com>
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Signed-off-by: Roger Wang <hey@rogerw.io>
Co-authored-by: Roger Wang <hey@rogerw.io>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Nick Hill <nickhill123@gmail.com>
|
2026-01-24 12:06:28 -08:00 |
|
7. Sun
|
cd775bdbe0
|
[Tests] Replace flaky sleep with polling in test_background_cancel (#32986)
Signed-off-by: 7. Sun <jhao.sun@gmail.com>
|
2026-01-24 16:39:07 +00:00 |
|
7. Sun
|
0ccecf8833
|
[Tests] Standardize RNG seed utility across test files (#32982)
Signed-off-by: 7. Sun <jhao.sun@gmail.com>
|
2026-01-24 06:47:14 +00:00 |
|
7. Sun
|
0b9a735e11
|
[Tests] Clarify pytest skip reasons with actionable context (#32981)
Signed-off-by: 7. Sun <jhao.sun@gmail.com>
|
2026-01-24 06:38:50 +00:00 |
|
ElizaWszola
|
a28b94e6ef
|
[Performance] Split FlashAttn attention and cache update (#25954)
Signed-off-by: ElizaWszola <ewszola@redhat.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: Luka Govedič <luka.govedic@gmail.com>
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Signed-off-by: Luka Govedič <lgovedic@redhat.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
Co-authored-by: Varun Sundar Rabindranath <varunsundar08@gmail.com>
Co-authored-by: Matthew Bonanni <mbonanni@redhat.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Luka Govedič <luka.govedic@gmail.com>
Co-authored-by: Lucas Wilkinson <lwilkins@redhat.com>
Co-authored-by: Luka Govedič <lgovedic@redhat.com>
|
2026-01-23 17:28:06 -08:00 |
|
Lucas Wilkinson
|
3a41459501
|
[cudagraphs] Refactor cudagraph capture loop (#32946)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2026-01-23 13:22:20 -07:00 |
|
Harry Huang
|
5206e5e28c
|
[V1][Hybrid] Mamba Prefix Caching with align mode (#30877)
Signed-off-by: huanghaoyan.hhy <huanghaoyan.hhy@alibaba-inc.com>
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
Co-authored-by: Chen Zhang <zhangch99@outlook.com>
|
2026-01-23 09:56:48 -08:00 |
|
David Ramon Prados
|
3a63be0faa
|
Support custom URI schemes and trace handlers for profiler (#32393)
|
2026-01-22 09:45:40 -08:00 |
|
Matt
|
c517d8c934
|
[Hardware][AMD][CI][Bugfix] Fix regressions from deprecated env vars (#32837)
Signed-off-by: Matthew Wong <Matthew.Wong2@amd.com>
|
2026-01-23 00:59:15 +08:00 |
|
Cyrus Leung
|
d117a4d1a9
|
[Frontend] Introduce Renderer for processing chat messages (using ModelConfig) (#30200)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-01-22 12:44:22 +00:00 |
|
Or Ozeri
|
421012b63a
|
OffloadingConnector: Support kernel_block_size != block_size (#30692)
Signed-off-by: Or Ozeri <oro@il.ibm.com>
|
2026-01-22 12:30:04 +00:00 |
|
liranschour
|
64e3d67ac0
|
Enable Cross layers KV cache layout at NIXL Connector (#30207)
Signed-off-by: Liran Schour <lirans@il.ibm.com>
Signed-off-by: liranschour <liranschour@users.noreply.github.com>
Co-authored-by: Or Ozeri <or@ozery.com>
|
2026-01-22 10:12:58 +00:00 |
|
Andreas Karatzas
|
a810299838
|
[ROCm][CI][Docs] Add comment explaining TRITON_ATTN fallback for ROCm (#32835)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-01-21 22:11:09 -08:00 |
|
Micah Williamson
|
019e2c3b7c
|
[ROCm][CI] Lower Acceptance Len Threshold For test_draft_model_quantization (#32731)
Signed-off-by: Micah Williamson <micah.williamson@amd.com>
|
2026-01-22 05:47:33 +00:00 |
|
Lucas Wilkinson
|
889722f3bf
|
[FlashMLA] Update FlashMLA to expose new arguments (#32810)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2026-01-21 22:02:39 -07:00 |
|
Divakar Verma
|
49d9653852
|
[ROCm][CI] fix get_valid_backends (#32787)
Signed-off-by: Divakar Verma <divakar.verma@amd.com>
|
2026-01-22 04:27:47 +00:00 |
|
knlnguyen1802
|
378385b90c
|
[EC Connector] Optimize remote cache check in scheduler (#32585)
Signed-off-by: knlnguyen1802 <knlnguyen1802@gmail.com>
|
2026-01-22 03:30:59 +00:00 |
|
Wentao Ye
|
6437ff1fb9
|
[Deprecation] Remove deprecated environment variables (#32812)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2026-01-22 02:25:16 +00:00 |
|
elvischenv
|
808d6fd7b9
|
Bump Flashinfer to v0.6.1 (#30993)
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
|
2026-01-21 08:49:50 -08:00 |
|
Lucas Wilkinson
|
b4f64e5b02
|
Update FlashMLA (#32491)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2026-01-21 13:03:37 +08:00 |
|
Nick Hill
|
6f067b1fb7
|
[Cleanup] Remove unused KVConnectorModelRunnerMixin methods (#32077)
Signed-off-by: Nick Hill <nickhill123@gmail.com>
|
2026-01-21 11:16:37 +08:00 |
|
Or Ozeri
|
7013e9ac8f
|
OffloadingConnector: Prevent redundant loads (#29087)
Signed-off-by: Or Ozeri <oro@il.ibm.com>
|
2026-01-21 01:15:42 +00:00 |
|
Lucas Wilkinson
|
2261340806
|
[Misc] Remove pad_for_cudagraphs from config (#30143)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Co-authored-by: Matthew Bonanni <mbonanni@redhat.com>
|
2026-01-20 15:05:48 -05:00 |
|
Rahul Tuli
|
f0feb1cf81
|
Test: added acceptance length tests (#32030)
Signed-off-by: rahul-tuli <rtuli@redhat.com>
|
2026-01-20 18:55:15 +00:00 |
|
Tomas Ruiz
|
4a5299c93f
|
feat: spec decode with draft models (#24322)
Signed-off-by: Tomas Ruiz <tomas.ruiz.te@gmail.com>
|
2026-01-19 16:05:46 -05:00 |
|
Vadim Gimpelson
|
0727cc9ecf
|
[BUGFIX] Fix test_mla_backends.py. Scale MLA projection weights to prevent numerical instability (#32529)
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>
|
2026-01-19 13:49:29 -05:00 |
|
Andreas Karatzas
|
c0a350ca73
|
[ROCm][CI] Add ROCm attention backend support for EAGLE DP tests (#32363)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-01-19 09:57:54 +00:00 |
|
Michael Goin
|
1be5a73571
|
[UX] Use kv_offloading_backend=native by default (#32421)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2026-01-15 18:55:11 +00:00 |
|
Wentao Ye
|
b34474bf2c
|
[Feature] Support async scheduling + PP (#32359)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2026-01-15 12:06:23 -05:00 |
|
Chauncey
|
707b44cc28
|
[Refactor] [11/N] to simplify the mcp architecture (#32396)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2026-01-15 18:49:31 +08:00 |
|
Cyrus Leung
|
cbbae38f93
|
[2/N] Move cache factories to MM registry (#32382)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-01-15 01:02:30 -08:00 |
|
dtc
|
1e584823f8
|
[Bugfix] Strengthen the check of X-data-parallel-rank in Hybrid LB mode (#32314)
Signed-off-by: Tianchen Ding <dtcccc@linux.alibaba.com>
|
2026-01-15 16:31:16 +08:00 |
|
Chauncey
|
4c1c501a7e
|
[Refactor] [10/N] to simplify the vLLM openai completion serving architecture (#32369)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2026-01-15 07:41:34 +00:00 |
|
Micah Williamson
|
773d7073ae
|
[ROCm][CI] Disable async scheduling on ROCm for test_structured_output[meta-llama/Meta-Llama-3.1-8B-Instruct-xgrammar-auto-speculative_config9] (#32355)
Signed-off-by: Micah Williamson <micah.williamson@amd.com>
|
2026-01-15 04:53:43 +00:00 |
|
Ryan Rock
|
15422ed3f7
|
[CI/Build][Hardware][AMD] Fix v1/shutdown (#31997)
Signed-off-by: Ryan Rock <ryan.rock@amd.com>
|
2026-01-15 04:01:42 +00:00 |
|
Lumosis
|
66652e8082
|
[BugFix] Assign page_size_padded when unifying kv cache spec. (#32283)
Signed-off-by: Lihao Ran <imlihao.ran@gmail.com>
|
2026-01-14 20:10:01 +00:00 |
|
Matthew Bonanni
|
2263d44b68
|
[4/N][Attention] Move MLA common to model_executor (#32060)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
|
2026-01-13 09:08:45 -08:00 |
|
Matthew Bonanni
|
98f60e5acb
|
[6/N][Attention] Move utils to more appropriate locations (#32215)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
|
2026-01-13 05:38:52 -08:00 |
|
Chauncey
|
fefce49807
|
[Refactor] [6/N] to simplify the vLLM openai chat_completion serving architecture (#32240)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2026-01-13 13:01:39 +00:00 |
|
Andreas Karatzas
|
df7e12715f
|
[ROCm][CI] Fix engine core client tests for ROCm spawn multiprocessing (#32061)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-01-13 15:14:30 +08:00 |
|
Nicolò Lucchesi
|
f8bd8394e3
|
[NIXL][Bugfix] Failure logging overhaul + early metadata free on failure (#32031)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2026-01-12 20:38:49 +00:00 |
|
Or Ozeri
|
2be765b68a
|
[BugFix] scheduler: Fix ordering preserving of skipped requests (#32173)
Signed-off-by: Or Ozeri <oro@il.ibm.com>
|
2026-01-12 18:39:38 +00:00 |
|
Matthew Bonanni
|
20228cb851
|
[3/N][Attention] Move AttentionMetadata-related code from utils.py to backend.py (#32054)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
|
2026-01-12 09:13:56 -08:00 |
|
Nicolò Lucchesi
|
5b68107411
|
[Misc][PD] Fix get_attn_backend usage in transfer connectors (#31988)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2026-01-12 18:10:05 +01:00 |
|
Asaf Joseph Gardin
|
8fb2c135be
|
[Bugfix] Fix stale SSM state for new Mamba requests scheduled as decode (#32118)
Signed-off-by: Josephasafg <ajgard7@gmail.com>
|
2026-01-12 17:02:38 +00:00 |
|
Or Ozeri
|
9cddbdba6d
|
OffloadingConnector: Add cpu_bytes_to_use configuration (#24498)
Signed-off-by: Or Ozeri <oro@il.ibm.com>
|
2026-01-12 15:00:43 +00:00 |
|
Or Ozeri
|
4c16ba617f
|
[KVConnector] OffloadingConnector: Fix bug in handling of preemptions (#29870)
Signed-off-by: Or Ozeri <oro@il.ibm.com>
|
2026-01-11 08:05:36 +00:00 |
|