[Feat] allow inplace loading lora (#31326)

Signed-off-by: Jackmin801 <ongjackm@gmail.com>
Signed-off-by: Jackmin801 <56836461+Jackmin801@users.noreply.github.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
This commit is contained in:
Jackmin801
2026-01-19 18:15:20 -08:00
committed by GitHub
parent 05dc4bfab6
commit 12dab78f49
10 changed files with 262 additions and 7 deletions

View File

@@ -210,6 +210,24 @@ Alternatively, follow these example steps to implement your own plugin:
For more details, refer to the [vLLM's Plugins System](../design/plugin_system.md).
### In-Place LoRA Reloading
When dynamically loading LoRA adapters, you may need to replace an existing adapter with updated weights while keeping the same name. The `load_inplace` parameter enables this functionality. This commonly occurs in asynchronous reinforcement learning setups, where adapters are continuously updated and swapped in without interrupting ongoing inference.
When `load_inplace=True`, vLLM will replace the existing adapter with the new one.
Example request to load or replace a LoRA adapter with the same name:
```bash
curl -X POST http://localhost:8000/v1/load_lora_adapter \
-H "Content-Type: application/json" \
-d '{
"lora_name": "my-adapter",
"lora_path": "/path/to/adapter/v2",
"load_inplace": true
}'
```
## New format for `--lora-modules`
In the previous version, users would provide LoRA modules via the following format, either as a key-value pair or in JSON format. For example: