omerpaz95
7227d06156
[Metrics] [KVConnector] Add Offloading Connector metrics ( #27942 )
...
Added queries and hits metrics for the Offloading Connector.
Also added timing metrics for store and load operations, which take the
average time it takes to load/store, per-token.
The metrics are available from Prometheus and from the StatLogger.
Signed-off-by: omerpaz95 <omerpaz95@gmail.com >
Co-authored-by: Omer Paz <Omer.Paz@ibm.com >
2026-01-27 13:34:49 +00:00
Or Ozeri
421012b63a
OffloadingConnector: Support kernel_block_size != block_size ( #30692 )
...
Signed-off-by: Or Ozeri <oro@il.ibm.com >
2026-01-22 12:30:04 +00:00
Lucas Wilkinson
6cdf015c3c
[Misc] Fix Current vLLM config is not set. warnings, assert to avoid issues in the future ( #31747 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com >
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
2026-01-08 15:20:49 -08:00
wangxiyuan
bb4337b34c
[Platform] Deprecate seed_everything ( #31659 )
...
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com >
2026-01-04 18:34:04 -08:00
Or Ozeri
174e39ead7
CPU KV Offloading: Use more CUDA streams ( #29013 )
...
Signed-off-by: Or Ozeri <oro@il.ibm.com >
2025-12-14 23:50:45 +00:00
Or Ozeri
c0c2dd1e0b
[BugFix] kv_offloading: Fix bug in loading of partial cpu blocks ( #28951 )
...
Signed-off-by: Or Ozeri <oro@il.ibm.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2025-11-20 18:55:10 +08:00
Zhewen Li
50b788a17a
[CI/Build] Fix AMD CI: test_cpu_gpu.py ( #27388 )
...
Signed-off-by: zhewenli <zhewenli@meta.com >
2025-10-23 07:55:00 +00:00
Harry Mellor
d6953beb91
Convert formatting to use ruff instead of yapf + isort ( #26247 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-10-05 07:06:22 -07:00
Or Ozeri
7ac67ea525
[KV offload][3/N] Add worker-side CPU support ( #21448 )
...
Signed-off-by: Or Ozeri <oro@il.ibm.com >
2025-09-19 09:53:45 -07:00