[KV Connector] Support using FlexKV as KV Cache Offloading option. (#34328)

Signed-off-by: phaedonsun <phaedonsun@tencent.com>
Co-authored-by: phaedonsun <phaedonsun@tencent.com>
This commit is contained in:
sfeiqiang
2026-03-12 15:46:20 +08:00
committed by GitHub
parent 00726c74c9
commit 8cb24d3aed
5 changed files with 725 additions and 0 deletions

View File

@@ -44,6 +44,12 @@ For NixlConnector, you may also specify one or multiple NIXL_Backend. Such as:
--kv-transfer-config '{"kv_connector":"OffloadingConnector","kv_role":"kv_both","kv_connector_extra_config":{"block_size": 64, "cpu_bytes_to_use": 1000000000}}'
```
- **FlexKVConnectorV1**: refer to [examples/offline_inference/prefix_caching_flexkv.py](../../examples/offline_inference/prefix_caching_flexkv.py) for the example usage of FlexKVConnectorV1. FlexKV is a distributed KV Store and multi-level cache management system for ultra-large-scale LLM inference.
```bash
--kv-transfer-config '{"kv_connector":"FlexKVConnectorV1","kv_role":"kv_both"}'
```
## Benchmarks
Please refer to [benchmarks/disagg_benchmarks](../../benchmarks/disagg_benchmarks) for disaggregated prefilling benchmarks.