examples/offline_inference/kv_load_failure_recovery/README.md

# KV Load Failure Recovery Test

This example builds upon the `disaggregated-prefill-v1` example in `examples/offline_inference`.

It demonstrates vLLM's ability to recover from KV load failures in both synchronous and asynchronous loading modes. The goal is to verify that vLLM correctly identifies invalid KV blocks, reschedules the affected requests, and ensures successful and consistent output.

## Files

- `prefill_example.py` – performs the prefill stage and saves KV data (same as in `disaggregated-prefill-v1`).
- `decode_example.py` – performs the decode stage. Accepts:
    - `--simulate-failure`: simulates KV load failure using a custom connector.
    - `--async-load`: enables asynchronous KV loading mode.
- `load_recovery_example_connector.py` – defines `LoadRecoveryExampleConnector`, a subclass of `ExampleConnector`, that simulates missing or corrupted external KV blocks by failing to load blocks for the first decode request.
- `run.sh` – orchestrates the test: runs the prefill stage, then three decode stages:
    1. Normal decode (baseline).
    2. Decode with simulated sync KV load failure.
    3. Decode with simulated async KV load failure.

    Finally, it compares the output of the baseline with the recovered outputs to verify correctness.

## How It Works

- The test dynamically loads `LoadRecoveryExampleConnector` via `KVTransferConfig.kv_connector_module_path`, enabling controlled simulation of load failures without modifying the original connector.
- The decode stages that simulate failure are expected to trigger recovery logic in vLLM, resulting in the same output as the baseline decode.
- If recovery fails, the script prints a unified diff of the output mismatch and exits with error.

## Usage

```bash
./run.sh
```
-												[V1] [P/D] Add Support for KV Load Failure Recovery (#19330)

Signed-off-by: David Ben-David <davidb@pliops.com>
Co-authored-by: David Ben-David <davidb@pliops.com>
											
										
										
											2025-10-01 00:57:08 +03:00
+								# KV Load Failure Recovery Test
 								This example builds upon the `disaggregated-prefill-v1` example in `examples/offline_inference`.
 								It demonstrates vLLM's ability to recover from KV load failures in both synchronous and asynchronous loading modes. The goal is to verify that vLLM correctly identifies invalid KV blocks, reschedules the affected requests, and ensures successful and consistent output.
 								## Files
 								- `prefill_example.py` – performs the prefill stage and saves KV data (same as in `disaggregated-prefill-v1`).
 								- `decode_example.py` – performs the decode stage. Accepts:
 								    - `--simulate-failure`: simulates KV load failure using a custom connector.
 								    - `--async-load`: enables asynchronous KV loading mode.
-												kv_transfer: Rename the shared storage connectors (#30201)

Signed-off-by: Or Ozeri <oro@il.ibm.com>
											
										
										
											2025-12-09 06:46:09 +02:00
+								- `load_recovery_example_connector.py` – defines `LoadRecoveryExampleConnector`, a subclass of `ExampleConnector`, that simulates missing or corrupted external KV blocks by failing to load blocks for the first decode request.
-												[V1] [P/D] Add Support for KV Load Failure Recovery (#19330)

Signed-off-by: David Ben-David <davidb@pliops.com>
Co-authored-by: David Ben-David <davidb@pliops.com>
											
										
										
											2025-10-01 00:57:08 +03:00
+								- `run.sh` – orchestrates the test: runs the prefill stage, then three decode stages:
 . Normal decode (baseline).
 . Decode with simulated sync KV load failure.
 . Decode with simulated async KV load failure.
 								    Finally, it compares the output of the baseline with the recovered outputs to verify correctness.
 								## How It Works
-												kv_transfer: Rename the shared storage connectors (#30201)

Signed-off-by: Or Ozeri <oro@il.ibm.com>
											
										
										
											2025-12-09 06:46:09 +02:00
+								- The test dynamically loads `LoadRecoveryExampleConnector` via `KVTransferConfig.kv_connector_module_path`, enabling controlled simulation of load failures without modifying the original connector.
-												[V1] [P/D] Add Support for KV Load Failure Recovery (#19330)

Signed-off-by: David Ben-David <davidb@pliops.com>
Co-authored-by: David Ben-David <davidb@pliops.com>
											
										
										
											2025-10-01 00:57:08 +03:00
+								- The decode stages that simulate failure are expected to trigger recovery logic in vLLM, resulting in the same output as the baseline decode.
 								- If recovery fails, the script prints a unified diff of the output mismatch and exits with error.
 								## Usage
 								```bash
 								./run.sh
-												[Docs] Fix format error in KV load failure recovery doc (#34137)

Signed-off-by: Jaebok Lee <jaebok9541@naver.com>
											
										
										
											2026-02-10 18:16:26 +08:00
+								```