[Model] Add LFM2-ColBERT-350M support (#37528)

Signed-off-by: Ilya Boytsov <ilyaboytsov1805@gmail.com>
This commit is contained in:
Ilya Boytsov
2026-03-20 15:57:57 +01:00
committed by GitHub
parent 9f6d9dd371
commit 8b6c6b9505
6 changed files with 125 additions and 1 deletions

View File

@@ -11,6 +11,7 @@ vLLM supports ColBERT models with multiple encoder backbones:
| `HF_ColBERT` | BERT | `answerdotai/answerai-colbert-small-v1`, `colbert-ir/colbertv2.0` |
| `ColBERTModernBertModel` | ModernBERT | `lightonai/GTE-ModernColBERT-v1` |
| `ColBERTJinaRobertaModel` | Jina XLM-RoBERTa | `jinaai/jina-colbert-v2` |
| `ColBERTLfm2Model` | LFM2 | `LiquidAI/LFM2-ColBERT-350M` |
**BERT-based ColBERT** models work out of the box:
@@ -29,6 +30,10 @@ vllm serve lightonai/GTE-ModernColBERT-v1 \
vllm serve jinaai/jina-colbert-v2 \
--hf-overrides '{"architectures": ["ColBERTJinaRobertaModel"]}' \
--trust-remote-code
# LFM2 backbone
vllm serve LiquidAI/LFM2-ColBERT-350M \
--hf-overrides '{"architectures": ["ColBERTLfm2Model"]}'
```
Then you can use the rerank API:

View File

@@ -39,6 +39,7 @@ Models of any architecture can be converted into embedding models using `--conve
| Architecture | Models | Example HF Models | [LoRA](../../features/lora.md) | [PP](../../serving/parallelism_scaling.md) |
| ------------ | ------ | ----------------- | -------------------- | ------------------------- |
| `ColBERTLfm2Model` | LFM2 | `LiquidAI/LFM2-ColBERT-350M` | | |
| `ColBERTModernBertModel` | ModernBERT | `lightonai/GTE-ModernColBERT-v1` | | |
| `ColBERTJinaRobertaModel` | Jina XLM-RoBERTa | `jinaai/jina-colbert-v2` | | |
| `HF_ColBERT` | BERT | `answerdotai/answerai-colbert-small-v1`, `colbert-ir/colbertv2.0` | | |