[Feat] allow inplace loading lora (#31326)

Signed-off-by: Jackmin801 <ongjackm@gmail.com> Signed-off-by: Jackmin801 <56836461+Jackmin801@users.noreply.github.com> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
2026-01-19 18:15:20 -08:00
parent 05dc4bfab6
commit 12dab78f49
10 changed files with 262 additions and 7 deletions
--- a/docs/features/lora.md
+++ b/docs/features/lora.md
@@ -210,6 +210,24 @@ Alternatively, follow these example steps to implement your own plugin:

    For more details, refer to the [vLLM's Plugins System](../design/plugin_system.md).

+### In-Place LoRA Reloading
+
+When dynamically loading LoRA adapters, you may need to replace an existing adapter with updated weights while keeping the same name. The `load_inplace` parameter enables this functionality. This commonly occurs in asynchronous reinforcement learning setups, where adapters are continuously updated and swapped in without interrupting ongoing inference.
+
+When `load_inplace=True`, vLLM will replace the existing adapter with the new one.
+
+Example request to load or replace a LoRA adapter with the same name:
+
+```bash
+curl -X POST http://localhost:8000/v1/load_lora_adapter \
+-H "Content-Type: application/json" \
+-d '{
+    "lora_name": "my-adapter",
+    "lora_path": "/path/to/adapter/v2",
+    "load_inplace": true
+}'
+```
+
 ## New format for `--lora-modules`

 In the previous version, users would provide LoRA modules via the following format, either as a key-value pair or in JSON format. For example: