[Doc] Improve MM models LoRA notes (#31979)

Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2026-01-09 00:55:22 +08:00
parent b8112c1d85
commit 49568d5cf9
1 changed files with 1 additions and 23 deletions
--- a/docs/models/supported_models.md
+++ b/docs/models/supported_models.md
@@ -642,29 +642,7 @@ See [this page](../features/multimodal_inputs.md) on how to pass multi-modal inp
    For hybrid-only models such as Llama-4, Step3 and Mistral-3, a text-only mode can be enabled by setting all supported multimodal modalities to 0 (e.g, `--limit-mm-per-prompt '{"image":0}`) so that their multimodal modules will not be loaded to free up more GPU memory for KV cache.

 !!! note
-    vLLM currently only supports dynamic LoRA adapters on the language backbone of multimodal models.
-    If you wish to use a model with LoRA in the multi-modal encoder,
-    please merge the weights into the base model first before running it in vLLM like a regular model.
-
-    ```python
-    from peft import PeftConfig, PeftModel
-    from transformers import AutoModelForImageTextToText, AutoProcessor
-
-    def merge_and_save(model_id: str, output_dir: str):
-        base_model = AutoModelForImageTextToText.from_pretrained(model_id)
-        lora_model = PeftModel.from_pretrained(
-            base_model,
-            model_id,
-            config=PeftConfig.from_pretrained(model_id),
-        )
-        model = lora_model.merge_and_unload().to(dtype=base_model.dtype)
-        model._hf_peft_config_loaded = False  # Needed to save the merged model
-
-        processor = AutoProcessor.from_pretrained(model_id)
-
-        model.save_pretrained(output_dir)
-        processor.save_pretrained(output_dir)
-    ```
+    vLLM currently supports adding LoRA adapters to the language backbone for most multimodal models. Additionally, vLLM now experimentally supports adding LoRA to the tower and connector modules for some multimodal models. See [this page](../features/lora.md).

 ### Generative Models