Enable Cross layers KV cache layout at NIXL Connector V2 (#33339)

Signed-off-by: Liran Schour <lirans@il.ibm.com> Signed-off-by: liranschour <liranschour@users.noreply.github.com> Co-authored-by: Or Ozeri <or@ozery.com> Co-authored-by: Nicolò Lucchesi <nicolo.lucchesi@gmail.com> Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>
2026-02-05 12:17:02 +02:00
parent 3e472e81f9
commit 8322d4e47f
6 changed files with 339 additions and 89 deletions
--- a/docs/features/nixl_connector_usage.md
+++ b/docs/features/nixl_connector_usage.md
@@ -213,6 +213,15 @@ Support use case: Prefill with 'HND' and decode with 'NHD' with experimental con
 --kv-transfer-config '{..., "enable_permute_local_kv":"True"}'
 ```

+### Cross layers blocks
+
+By default, this feature is disabled. On attention backends that support this feature, each logical block is contiguous in physical memory. This reduces the number of buffers that need to be transferred.
+To enable this feature:
+
+```bash
+--kv-transfer-config '{..., "kv_connector_extra_config": {"enable_cross_layers_blocks": "True"}}'
+```
+
 ## Example Scripts/Code

 Refer to these example scripts in the vLLM repository: