Jiayi Yan
6a895197fa
[Bugfix][CI] fix typos ( #34934 )
...
Signed-off-by: 1195343015 <1195343015@qq.com >
Signed-off-by: Jiayi Yan <66017932+1195343015@users.noreply.github.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-03-05 17:05:46 +00:00
Or Ozeri
f2ad952f40
[BugFix][kv_offload]: Fix kernel block size detection ( #35125 )
...
Signed-off-by: Or Ozeri <oro@il.ibm.com >
2026-02-26 16:29:34 +00:00
omerpaz95
7227d06156
[Metrics] [KVConnector] Add Offloading Connector metrics ( #27942 )
...
Added queries and hits metrics for the Offloading Connector.
Also added timing metrics for store and load operations, which take the
average time it takes to load/store, per-token.
The metrics are available from Prometheus and from the StatLogger.
Signed-off-by: omerpaz95 <omerpaz95@gmail.com >
Co-authored-by: Omer Paz <Omer.Paz@ibm.com >
2026-01-27 13:34:49 +00:00
Or Ozeri
421012b63a
OffloadingConnector: Support kernel_block_size != block_size ( #30692 )
...
Signed-off-by: Or Ozeri <oro@il.ibm.com >
2026-01-22 12:30:04 +00:00
Or Ozeri
4c16ba617f
[KVConnector] OffloadingConnector: Fix bug in handling of preemptions ( #29870 )
...
Signed-off-by: Or Ozeri <oro@il.ibm.com >
2026-01-11 08:05:36 +00:00
Or Ozeri
2a4dbe24ea
[BugFix] Wait for compute before offloading KV to CPU ( #31341 )
...
Signed-off-by: Or Ozeri <oro@il.ibm.com >
2026-01-10 22:25:08 +00:00
Matthew Bonanni
2612ba9285
[1/N][Attention] Restructure attention: move files ( #31916 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
2026-01-09 13:10:24 -08:00
Or Ozeri
174e39ead7
CPU KV Offloading: Use more CUDA streams ( #29013 )
...
Signed-off-by: Or Ozeri <oro@il.ibm.com >
2025-12-14 23:50:45 +00:00
Matthew Bonanni
430dd4d9eb
[Attention] Remove imports from vllm/attention/__init__.py ( #29342 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
2025-11-26 10:53:15 -07:00
Or Ozeri
647464719b
[KVConnector][Core] Support cross-layer KV blocks ( #27743 )
...
Signed-off-by: Or Ozeri <oro@il.ibm.com >
2025-11-20 19:09:59 +01:00
Or Ozeri
c0c2dd1e0b
[BugFix] kv_offloading: Fix bug in loading of partial cpu blocks ( #28951 )
...
Signed-off-by: Or Ozeri <oro@il.ibm.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2025-11-20 18:55:10 +08:00
Kunshang Ji
2a2d5d2780
Replace torch.cuda.Event with torch.Event for better hardware compatibility ( #26985 )
...
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com >
2025-11-18 11:34:36 -08:00
Jonathan Chen
ca76486a16
[Chore] Separate out vllm.utils.platform_utils.py ( #27374 )
...
Signed-off-by: Jonathan <chenleejonathan@gmail.com >
2025-10-23 19:08:06 +00:00
Harry Mellor
d6953beb91
Convert formatting to use ruff instead of yapf + isort ( #26247 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-10-05 07:06:22 -07:00
Or Ozeri
7ac67ea525
[KV offload][3/N] Add worker-side CPU support ( #21448 )
...
Signed-off-by: Or Ozeri <oro@il.ibm.com >
2025-09-19 09:53:45 -07:00