biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Jiayi Yan	6a895197fa	[Bugfix][CI] fix typos (#34934 ) Signed-off-by: 1195343015 <1195343015@qq.com> Signed-off-by: Jiayi Yan <66017932+1195343015@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-05 17:05:46 +00:00
Or Ozeri	f2ad952f40	[BugFix][kv_offload]: Fix kernel block size detection (#35125 ) Signed-off-by: Or Ozeri <oro@il.ibm.com>	2026-02-26 16:29:34 +00:00
omerpaz95	7227d06156	[Metrics] [KVConnector] Add Offloading Connector metrics (#27942 ) Added queries and hits metrics for the Offloading Connector. Also added timing metrics for store and load operations, which take the average time it takes to load/store, per-token. The metrics are available from Prometheus and from the StatLogger. Signed-off-by: omerpaz95 <omerpaz95@gmail.com> Co-authored-by: Omer Paz <Omer.Paz@ibm.com>	2026-01-27 13:34:49 +00:00
Or Ozeri	421012b63a	OffloadingConnector: Support kernel_block_size != block_size (#30692 ) Signed-off-by: Or Ozeri <oro@il.ibm.com>	2026-01-22 12:30:04 +00:00
Or Ozeri	4c16ba617f	[KVConnector] OffloadingConnector: Fix bug in handling of preemptions (#29870 ) Signed-off-by: Or Ozeri <oro@il.ibm.com>	2026-01-11 08:05:36 +00:00
Or Ozeri	2a4dbe24ea	[BugFix] Wait for compute before offloading KV to CPU (#31341 ) Signed-off-by: Or Ozeri <oro@il.ibm.com>	2026-01-10 22:25:08 +00:00
Matthew Bonanni	2612ba9285	[1/N][Attention] Restructure attention: move files (#31916 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2026-01-09 13:10:24 -08:00
Or Ozeri	174e39ead7	CPU KV Offloading: Use more CUDA streams (#29013 ) Signed-off-by: Or Ozeri <oro@il.ibm.com>	2025-12-14 23:50:45 +00:00
Matthew Bonanni	430dd4d9eb	[Attention] Remove imports from `vllm/attention/__init__.py` (#29342 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2025-11-26 10:53:15 -07:00
Or Ozeri	647464719b	[KVConnector][Core] Support cross-layer KV blocks (#27743 ) Signed-off-by: Or Ozeri <oro@il.ibm.com>	2025-11-20 19:09:59 +01:00
Or Ozeri	c0c2dd1e0b	[BugFix] kv_offloading: Fix bug in loading of partial cpu blocks (#28951 ) Signed-off-by: Or Ozeri <oro@il.ibm.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2025-11-20 18:55:10 +08:00
Kunshang Ji	2a2d5d2780	Replace `torch.cuda.Event` with `torch.Event` for better hardware compatibility (#26985 ) Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>	2025-11-18 11:34:36 -08:00
Jonathan Chen	ca76486a16	[Chore] Separate out `vllm.utils.platform_utils.py` (#27374 ) Signed-off-by: Jonathan <chenleejonathan@gmail.com>	2025-10-23 19:08:06 +00:00
Harry Mellor	d6953beb91	Convert formatting to use `ruff` instead of `yapf` + `isort` (#26247 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-10-05 07:06:22 -07:00
Or Ozeri	7ac67ea525	[KV offload][3/N] Add worker-side CPU support (#21448 ) Signed-off-by: Or Ozeri <oro@il.ibm.com>	2025-09-19 09:53:45 -07:00

15 Commits