Revert "Enable Cross layers KV cache layout at NIXL Connector (#30207)" (#33241)

Signed-off-by: Or Ozeri <oro@il.ibm.com>
Co-authored-by: Kevin H. Luu <khluu000@gmail.com>
This commit is contained in:
Or Ozeri
2026-01-28 14:36:00 +02:00
committed by GitHub
parent 247d1a32ea
commit 2e8de86777
5 changed files with 88 additions and 307 deletions

View File

@@ -184,15 +184,6 @@ Support use case: Prefill with 'HND' and decode with 'NHD' with experimental con
--kv-transfer-config '{..., "enable_permute_local_kv":"True"}'
```
### Cross layers blocks
By default, this feature is disabled. On attention backends that support this feature, each logical block is contiguous in physical memory. This reduces the number of buffers that need to be transferred.
To enable this feature:
```bash
--kv-transfer-config '{..., "kv_connector_extra_config": {"enable_cross_layers_blocks": "True"}}'
```
## Example Scripts/Code
Refer to these example scripts in the vLLM repository: