Commit Graph

136 Commits

Author SHA1 Message Date
Ryan Rock
87d319c52f [AMD][CI] Support Triton attention with ExampleConnector (#34931)
Signed-off-by: Ryan Rock <ryan.rock@amd.com>
2026-03-01 09:58:07 +02:00
Andreas Karatzas
067c5d9ad1 [ROCm][CI] Added MI325 mirrors (#34923)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-02-24 13:37:15 -08:00
Nicolò Lucchesi
ab6f3487a6 [PD] Change kv_load_failure_policy Default from "recompute" to "fail" (#34896)
Signed-off-by: NickLucche <nlucches@redhat.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2026-02-21 01:34:57 -08:00
rasmith
0c1dc42748 [CI][AMD][BugFix][P/D] Add default_vllm_config to test_moriio_connector.py so tests pass (#33739)
Signed-off-by: Randall Smith <Randall.Smith@amd.com>
2026-02-19 21:32:40 -08:00
Cyrus Leung
ea5ff3a1f6 [Refactor] Simplify BOS/EOS token handling (#34435)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2026-02-12 18:18:24 -08:00
Seiji Eicher
aca5967416 [KV Connector] Add missing method overrides to MultiConnector (#33292)
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
2026-02-06 12:58:21 -05:00
liranschour
8322d4e47f Enable Cross layers KV cache layout at NIXL Connector V2 (#33339)
Signed-off-by: Liran Schour <lirans@il.ibm.com>
Signed-off-by: liranschour <liranschour@users.noreply.github.com>
Co-authored-by: Or Ozeri <or@ozery.com>
Co-authored-by: Nicolò Lucchesi <nicolo.lucchesi@gmail.com>
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>
2026-02-05 02:17:02 -08:00
Mark McLoughlin
2abd97592f [KV Connector][Metrics] Do not count local prefix cache hits in connector queries (#30522)
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
2026-02-05 09:57:27 +02:00
Or Ozeri
8e32690869 [KV Connector][BugFix] scheduler: Delay freeing blocks of aborted async loads (#32255)
Fixes a not-yet-reported case where it was possible for blocks to be
freed by an abort before an async transfer completed, resulting
in corrupted KV data.

Signed-off-by: Or Ozeri <oro@il.ibm.com>
2026-02-04 11:16:34 +00:00
Nicolò Lucchesi
528b3076af [CI][Bugfix] Fix flaky tests/v1/kv_connector/unit/test_multi_connector.py::test_multi_example_connector_consistency (#33555)
Signed-off-by: NickLucche <nlucches@redhat.com>
2026-02-02 03:01:29 -08:00
Or Ozeri
2e8de86777 Revert "Enable Cross layers KV cache layout at NIXL Connector (#30207)" (#33241)
Signed-off-by: Or Ozeri <oro@il.ibm.com>
Co-authored-by: Kevin H. Luu <khluu000@gmail.com>
2026-01-28 04:36:00 -08:00
omerpaz95
7227d06156 [Metrics] [KVConnector] Add Offloading Connector metrics (#27942)
Added queries and hits metrics for the Offloading Connector.

Also added timing metrics for store and load operations, which take the
average time it takes to load/store, per-token.

The metrics are available from Prometheus and from the StatLogger.

Signed-off-by: omerpaz95 <omerpaz95@gmail.com>
Co-authored-by: Omer Paz <Omer.Paz@ibm.com>
2026-01-27 13:34:49 +00:00
ElizaWszola
a28b94e6ef [Performance] Split FlashAttn attention and cache update (#25954)
Signed-off-by: ElizaWszola <ewszola@redhat.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: Luka Govedič <luka.govedic@gmail.com>
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Signed-off-by: Luka Govedič <lgovedic@redhat.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
Co-authored-by: Varun Sundar Rabindranath <varunsundar08@gmail.com>
Co-authored-by: Matthew Bonanni <mbonanni@redhat.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Luka Govedič <luka.govedic@gmail.com>
Co-authored-by: Lucas Wilkinson <lwilkins@redhat.com>
Co-authored-by: Luka Govedič <lgovedic@redhat.com>
2026-01-23 17:28:06 -08:00
liranschour
64e3d67ac0 Enable Cross layers KV cache layout at NIXL Connector (#30207)
Signed-off-by: Liran Schour <lirans@il.ibm.com>
Signed-off-by: liranschour <liranschour@users.noreply.github.com>
Co-authored-by: Or Ozeri <or@ozery.com>
2026-01-22 10:12:58 +00:00
Nick Hill
6f067b1fb7 [Cleanup] Remove unused KVConnectorModelRunnerMixin methods (#32077)
Signed-off-by: Nick Hill <nickhill123@gmail.com>
2026-01-21 11:16:37 +08:00
Or Ozeri
7013e9ac8f OffloadingConnector: Prevent redundant loads (#29087)
Signed-off-by: Or Ozeri <oro@il.ibm.com>
2026-01-21 01:15:42 +00:00
Michael Goin
1be5a73571 [UX] Use kv_offloading_backend=native by default (#32421)
Signed-off-by: mgoin <mgoin64@gmail.com>
2026-01-15 18:55:11 +00:00
Nicolò Lucchesi
f8bd8394e3 [NIXL][Bugfix] Failure logging overhaul + early metadata free on failure (#32031)
Signed-off-by: NickLucche <nlucches@redhat.com>
2026-01-12 20:38:49 +00:00
Nicolò Lucchesi
5b68107411 [Misc][PD] Fix get_attn_backend usage in transfer connectors (#31988)
Signed-off-by: NickLucche <nlucches@redhat.com>
2026-01-12 18:10:05 +01:00
Or Ozeri
9cddbdba6d OffloadingConnector: Add cpu_bytes_to_use configuration (#24498)
Signed-off-by: Or Ozeri <oro@il.ibm.com>
2026-01-12 15:00:43 +00:00
Or Ozeri
4c16ba617f [KVConnector] OffloadingConnector: Fix bug in handling of preemptions (#29870)
Signed-off-by: Or Ozeri <oro@il.ibm.com>
2026-01-11 08:05:36 +00:00
Or Ozeri
2a4dbe24ea [BugFix] Wait for compute before offloading KV to CPU (#31341)
Signed-off-by: Or Ozeri <oro@il.ibm.com>
2026-01-10 22:25:08 +00:00
Matthew Bonanni
2612ba9285 [1/N][Attention] Restructure attention: move files (#31916)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
2026-01-09 13:10:24 -08:00
inkcherry
4505849b30 [ROCm][PD] add moriio kv connector. (#29304)
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
2026-01-09 14:01:57 +00:00
Andreas Karatzas
e02706d2d2 [ROCm][CI][V1] Fix nixl_connector test failure and achieve CUDA parity in test_async_scheduling (#32000)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-01-09 20:48:32 +08:00
Lucas Wilkinson
6cdf015c3c [Misc] Fix Current vLLM config is not set. warnings, assert to avoid issues in the future (#31747)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
2026-01-08 15:20:49 -08:00
Lumosis
b634e619bb Decouple page_size_bytes calculation in AttentionSpec for TPU/RPA Compatibility. (#31635)
Signed-off-by: Lihao Ran <imlihao.ran@gmail.com>
Signed-off-by: Lumosis <30372757+Lumosis@users.noreply.github.com>
2026-01-08 09:00:07 +00:00
Kfir Toledo
b89443b8d9 [KVConnector]: Enable Cross-layers KV cache layout for MultiConnector (#30761)
Signed-off-by: Kfir Toledo <kfir.toledo@ibm.com>
2026-01-07 16:59:43 +00:00
Sage
39512aba72 [Prefix Cache] Include lora_name in BlockStored event for deterministic KV-cache reconstruction (#27577)
Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>
Co-authored-by: Sage <80211083+sagiahrac@users.noreply.github.com>
2025-12-30 00:17:16 +00:00
Cyrus Leung
aa3868ecfe [Chore] Remove unused noqas (#31263)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-12-24 05:38:46 -08:00
Mark McLoughlin
f790068600 [Core] Add a random suffix to frontend-provided request IDs (#27987)
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
2025-12-23 13:05:39 -08:00
Cyrus Leung
bb62dda2c3 [Misc] Introduce encode_*_url utility function (#31208)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-12-23 13:45:21 +00:00
Nicolò Lucchesi
bc3700e0cd [NIXL] Support P tensor-parallel-size > D tensor-parallel-size (#27274)
Signed-off-by: NickLucche <nlucches@redhat.com>
2025-12-18 11:53:30 +08:00
Matthew Bonanni
7eb6cb6c18 [Attention] Update tests to remove deprecated env vars (#30563)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
2025-12-17 09:49:59 -08:00
Chendi.Xue
ae2e503dda [NIXL][BUG FIX] Fix a bug for PD with host_buffer after merging 29665 (#30420)
Signed-off-by: Chendi Xue <chendi.xue@intel.com>
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Co-authored-by: Mark McLoughlin <markmc@redhat.com>
2025-12-14 15:38:28 +00:00
Martin Hickey
f4417f8449 [KVConnector] Add KV events to KV Connectors (#28309)
Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com>
2025-12-11 15:30:29 +01:00
Will Eaton
a9e4106f28 [P/D] KV Load Failure Recovery/Abort Configuration (#26813)
Signed-off-by: Will Eaton <weaton@redhat.com>
Signed-off-by: Will Eaton <me@wseaton.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
Co-authored-by: Mark McLoughlin <markmc@redhat.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
Co-authored-by: chaunceyjiang <chaunceyjiang@gmail.com>
2025-12-10 11:00:52 -08:00
Or Ozeri
4c6fd25880 kv_transfer: Rename the shared storage connectors (#30201)
Signed-off-by: Or Ozeri <oro@il.ibm.com>
2025-12-08 20:46:09 -08:00
Cyrus Leung
e83b7e379c Revert "[Renderer] Separate out RendererConfig from ModelConfig (#30145)" (#30199) 2025-12-07 00:00:22 -08:00
Cyrus Leung
27f4c2fd46 [Renderer] Separate out RendererConfig from ModelConfig (#30145)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-12-06 23:15:42 -08:00
rasmith
b12f4a9830 [CI/Build][AMD] Use ROCM_ATTN instead of FLASH_ATTN test for test_register_kv_caches for ROCm and update test for TRITON_ATTN (#29985)
Signed-off-by: Randall Smith <ransmith@amd.com>
Co-authored-by: Randall Smith <ransmith@amd.com>
Co-authored-by: TJian <tunjian.tan@embeddedllm.com>
2025-12-05 20:57:38 -08:00
Samuel Shen
7e31c3a3f6 [CI]: Remove unnecessary imports from test_lmache_integration (#30157)
Signed-off-by: Samuel Shen <slshen@uchicago.edu>
Co-authored-by: Samuel Shen <slshen@uchicago.edu>
2025-12-06 12:53:34 +08:00
Matthew Bonanni
66e674cdd5 [Attention][UX][1/N] Add AttentionConfig and change attention env vars to CLI arguments (#26315)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
2025-12-05 09:48:43 -08:00
Mark McLoughlin
dff0a2b394 [NIXL] Add remote_request_id to kv_transfer_params (#29665)
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
2025-12-05 09:43:48 -08:00
Nicolò Lucchesi
78c44fd722 [NIXL] Small cleanup of unused variables (#29618)
Signed-off-by: NickLucche <nlucches@redhat.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2025-12-05 18:17:36 +01:00
Mark McLoughlin
949a6a19d2 [NIXL] Add compatibility checking to NIXL KV connector handshake (#29503)
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
2025-12-05 15:52:45 +01:00
rasmith
5aa9b09040 [CI/Build][AMD] Skip test_shared_storage_connector_hashes in test_shared_storage_connector.py due to hipErrorLaunchFailure when calling .cpu() (#29839)
Signed-off-by: Randall Smith <ransmith@amd.com>
Co-authored-by: Randall Smith <ransmith@amd.com>
2025-12-03 22:56:35 +08:00
Harry Mellor
951445a52d Remove default values from InitVars so that they're not stored (#29859)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-12-02 12:16:37 +00:00
Matthew Bonanni
fc1d8be3dc [Attention] Update attention imports (#29540)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
2025-11-27 11:19:09 -05:00
rasmith
71362ffab4 [CI/Build][AMD] Skip test_multi_shared_storage_connector_consistency in test_multi_connector.py due to hipErrorLaunchFailure when calling .cpu() (#29253)
Signed-off-by: Randall Smith <ransmith@amd.com>
Co-authored-by: Randall Smith <ransmith@amd.com>
2025-11-23 04:42:49 +00:00