[Doc] Improve MM models LoRA notes (#31979)

Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
This commit is contained in:
Jee Jee Li
2026-01-09 00:55:22 +08:00
committed by GitHub
parent b8112c1d85
commit 49568d5cf9

View File

@@ -642,29 +642,7 @@ See [this page](../features/multimodal_inputs.md) on how to pass multi-modal inp
For hybrid-only models such as Llama-4, Step3 and Mistral-3, a text-only mode can be enabled by setting all supported multimodal modalities to 0 (e.g, `--limit-mm-per-prompt '{"image":0}`) so that their multimodal modules will not be loaded to free up more GPU memory for KV cache.
!!! note
vLLM currently only supports dynamic LoRA adapters on the language backbone of multimodal models.
If you wish to use a model with LoRA in the multi-modal encoder,
please merge the weights into the base model first before running it in vLLM like a regular model.
```python
from peft import PeftConfig, PeftModel
from transformers import AutoModelForImageTextToText, AutoProcessor
def merge_and_save(model_id: str, output_dir: str):
base_model = AutoModelForImageTextToText.from_pretrained(model_id)
lora_model = PeftModel.from_pretrained(
base_model,
model_id,
config=PeftConfig.from_pretrained(model_id),
)
model = lora_model.merge_and_unload().to(dtype=base_model.dtype)
model._hf_peft_config_loaded = False # Needed to save the merged model
processor = AutoProcessor.from_pretrained(model_id)
model.save_pretrained(output_dir)
processor.save_pretrained(output_dir)
```
vLLM currently supports adding LoRA adapters to the language backbone for most multimodal models. Additionally, vLLM now experimentally supports adding LoRA to the tower and connector modules for some multimodal models. See [this page](../features/lora.md).
### Generative Models