This example builds upon the `disaggregated-prefill-v1` example in `examples/offline_inference`.
It demonstrates vLLM's ability to recover from KV load failures in both synchronous and asynchronous loading modes. The goal is to verify that vLLM correctly identifies invalid KV blocks, reschedules the affected requests, and ensures successful and consistent output.
## Files
-`prefill_example.py`– performs the prefill stage and saves KV data (same as in `disaggregated-prefill-v1`).
-`decode_example.py`– performs the decode stage. Accepts:
-`--simulate-failure`: simulates KV load failure using a custom connector.
-`load_recovery_example_connector.py`– defines `LoadRecoveryExampleConnector`, a subclass of `ExampleConnector`, that simulates missing or corrupted external KV blocks by failing to load blocks for the first decode request.
- The test dynamically loads `LoadRecoveryExampleConnector` via `KVTransferConfig.kv_connector_module_path`, enabling controlled simulation of load failures without modifying the original connector.