Or Ozeri
512c5eb455
[kv_offload+HMA][5/N]: Track group block hashes and block IDs ( #37109 )
...
Signed-off-by: Or Ozeri <oro@il.ibm.com >
2026-04-08 19:50:28 +03:00
Ronen Schaffer
7c139ab23f
[KV Offload] Clean up ARC/LRU refactoring leftovers: group ARC tests and fix stale comment ( #38217 )
...
Signed-off-by: Ronen Schaffer <ronen.schaffer@ibm.com >
2026-04-07 15:14:45 +03:00
wliao2
32e0c0bfa2
refactor hard coded device string in test files under tests/v1 and tests/lora ( #37566 )
...
Signed-off-by: Liao, Wei <wei.liao@intel.com >
2026-04-03 11:21:47 +08:00
Or Ozeri
7cc302dd87
[kv_offload+HMA][7/N]: Support register_kv_caches for hybrid models ( #37853 )
...
Signed-off-by: Or Ozeri <oro@il.ibm.com >
2026-03-27 08:38:33 +03:00
Ronen Schaffer
e3c6c10cad
[KV Offload] Refactor CPU offloading: pluggable CachePolicy, remove Backend abstraction, restructure into cpu/ package ( #37874 )
...
Signed-off-by: Ronen Schaffer <ronen.schaffer@ibm.com >
2026-03-24 07:02:51 +02:00
Or Ozeri
5dd8df0701
[kv_offload+HMA][2/N]: Support multiple KV groups in GPULoadStoreSpec ( #36642 )
...
Signed-off-by: Or Ozeri <oro@il.ibm.com >
2026-03-18 19:26:40 +02:00
Andreas Karatzas
ce2ef42fd3
[CI] Stabilize test_cpu_offloading by waiting for async offload before cache reset ( #37335 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-03-18 05:26:20 +00:00
Srinivasoo7
106ff69c4e
feat(kv-offload): Strategy A — StoreReusedOffloadingManager gates CPU stores on reuse frequency ( #35342 )
...
Signed-off-by: srinivas_oo7 <Sriusa4414@gmail.com >
Signed-off-by: Sriusa4414@gmail.com
Signed-off-by: Srinivasoo7 <158864704+Srinivasoo7@users.noreply.github.com >
Co-authored-by: srinivas_oo7 <sklinkedin0120@gmail.com >
Co-authored-by: Srinivasoo7 <158864704+Srinivasoo7@users.noreply.github.com >
Co-authored-by: Or Ozeri <oro@il.ibm.com >
2026-03-10 14:43:40 +00:00
Ronen Schaffer
bb6888b8b1
[Bugfix][CPUOffloadingManager] Prevent eviction of already-stored blocks in LRU/ARC prepare_store() ( #35846 )
...
Signed-off-by: Ronen Schaffer <ronen.schaffer@ibm.com >
2026-03-04 14:25:33 +02:00
omerpaz95
7227d06156
[Metrics] [KVConnector] Add Offloading Connector metrics ( #27942 )
...
Added queries and hits metrics for the Offloading Connector.
Also added timing metrics for store and load operations, which take the
average time it takes to load/store, per-token.
The metrics are available from Prometheus and from the StatLogger.
Signed-off-by: omerpaz95 <omerpaz95@gmail.com >
Co-authored-by: Omer Paz <Omer.Paz@ibm.com >
2026-01-27 13:34:49 +00:00
Or Ozeri
421012b63a
OffloadingConnector: Support kernel_block_size != block_size ( #30692 )
...
Signed-off-by: Or Ozeri <oro@il.ibm.com >
2026-01-22 12:30:04 +00:00
Or Ozeri
9cddbdba6d
OffloadingConnector: Add cpu_bytes_to_use configuration ( #24498 )
...
Signed-off-by: Or Ozeri <oro@il.ibm.com >
2026-01-12 15:00:43 +00:00
Or Ozeri
4c16ba617f
[KVConnector] OffloadingConnector: Fix bug in handling of preemptions ( #29870 )
...
Signed-off-by: Or Ozeri <oro@il.ibm.com >
2026-01-11 08:05:36 +00:00
Lucas Wilkinson
6cdf015c3c
[Misc] Fix Current vLLM config is not set. warnings, assert to avoid issues in the future ( #31747 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com >
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
2026-01-08 15:20:49 -08:00
Andreas Karatzas
5f2a473ff3
[ROCm][CI] v1 cpu offloading attention backend fix ( #31833 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-01-08 14:37:50 +08:00
Or Ozeri
d8e38d4939
Triton Attention: Support cross-layers blocks ( #30687 )
...
Signed-off-by: Or Ozeri <oro@il.ibm.com >
2026-01-05 19:29:16 +00:00
wangxiyuan
bb4337b34c
[Platform] Deprecate seed_everything ( #31659 )
...
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com >
2026-01-04 18:34:04 -08:00
Matthew Bonanni
7eb6cb6c18
[Attention] Update tests to remove deprecated env vars ( #30563 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
2025-12-17 09:49:59 -08:00
Or Ozeri
174e39ead7
CPU KV Offloading: Use more CUDA streams ( #29013 )
...
Signed-off-by: Or Ozeri <oro@il.ibm.com >
2025-12-14 23:50:45 +00:00
Micah Williamson
43c5792592
[ROCm][CI] Fix test_cpu_offloading for ROCm ( #29548 )
...
Signed-off-by: Micah Williamson <micah.williamson@amd.com >
2025-11-27 07:54:44 +00:00
rasmith
8e22da1d7f
[CI/Build Don't add FLASHINFER backend in test_cpu_offloading.py ( #29229 )
...
Signed-off-by: Randall Smith <ransmith@amd.com >
Co-authored-by: Randall Smith <ransmith@amd.com >
2025-11-22 11:00:54 +00:00
Or Ozeri
647464719b
[KVConnector][Core] Support cross-layer KV blocks ( #27743 )
...
Signed-off-by: Or Ozeri <oro@il.ibm.com >
2025-11-20 19:09:59 +01:00
Or Ozeri
c0c2dd1e0b
[BugFix] kv_offloading: Fix bug in loading of partial cpu blocks ( #28951 )
...
Signed-off-by: Or Ozeri <oro@il.ibm.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2025-11-20 18:55:10 +08:00
alberto
bac904565f
Implement ARC KV cache eviction policy for CPU offloader ( #27039 )
...
Signed-off-by: Alberto Perdomo <aperdomo@redhat.com >
Signed-off-by: alberto <aperdomo@redhat.com >
Co-authored-by: Or Ozeri <or@ozery.com >
2025-11-12 09:51:39 -08:00
Zhewen Li
0fe0140408
[KV offload] Enable CPU KV offload on CUDA alike Platforms ( #27770 )
...
Signed-off-by: zhewenli <zhewenli@meta.com >
2025-10-30 22:10:29 +08:00
Zhewen Li
9a0d2f0d92
[CI/Build] Skip cpu offloading test on AMD ( #27690 )
...
Signed-off-by: zhewenli <zhewenli@meta.com >
2025-10-29 12:55:51 +00:00
Or Ozeri
111faf1118
[Core] Scheduler: Publish connector events after output ( #25875 )
...
Signed-off-by: Or Ozeri <oro@il.ibm.com >
2025-10-28 21:01:33 +00:00
Kuntai Du
b853540388
[Core][Hybrid allocator + kv connector 1/n] Enable hybrid allocator + KV cache connector ( #25712 )
...
Signed-off-by: KuntaiDu <kuntai@uchicago.edu >
Signed-off-by: Kuntai Du <kuntai@uchicago.edu >
2025-10-24 23:34:18 -07:00
Zhewen Li
50b788a17a
[CI/Build] Fix AMD CI: test_cpu_gpu.py ( #27388 )
...
Signed-off-by: zhewenli <zhewenli@meta.com >
2025-10-23 07:55:00 +00:00
Harry Mellor
8fcaaf6a16
Update Optional[x] -> x | None and Union[x, y] to x | y ( #26633 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-10-12 09:51:31 -07:00
Harry Mellor
d6953beb91
Convert formatting to use ruff instead of yapf + isort ( #26247 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-10-05 07:06:22 -07:00
Or Ozeri
8db2939289
[KV offload][5/N] Add CPUOffloadingSpec ( #24251 )
...
Signed-off-by: Or Ozeri <oro@il.ibm.com >
2025-09-22 12:30:36 -07:00
Or Ozeri
7ac67ea525
[KV offload][3/N] Add worker-side CPU support ( #21448 )
...
Signed-off-by: Or Ozeri <oro@il.ibm.com >
2025-09-19 09:53:45 -07:00
Or Ozeri
9d1c50a5ac
[KV offload][2/N] Introduce LRU-based CPU offloading management ( #20075 )
...
Signed-off-by: Or Ozeri <oro@il.ibm.com >
2025-09-19 00:20:51 +00:00
Or Ozeri
a53ad626d6
[KV offload][1b/N] rename offloading to kv_offload ( #25191 )
...
Signed-off-by: Or Ozeri <oro@il.ibm.com >
2025-09-18 20:53:52 +00:00