feat: Add ColBERT late interaction model support (#33686)
Signed-off-by: Ilya Boytsov <ilyaboytsov1805@gmail.com> Signed-off-by: Ilya Boytsov <boytsovpanamera@mail.ru> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: wang.yuqi <yuqi.wang@daocloud.io>
This commit is contained in:
@@ -307,6 +307,62 @@ An OpenAI client example can be found here: [examples/pooling/embed/openai_embed
|
||||
|
||||
## Specific models
|
||||
|
||||
### ColBERT Late Interaction Models
|
||||
|
||||
[ColBERT](https://arxiv.org/abs/2004.12832) (Contextualized Late Interaction over BERT) is a retrieval model that uses per-token embeddings and MaxSim scoring for document ranking. Unlike single-vector embedding models, ColBERT retains token-level representations and computes relevance scores through late interaction, providing better accuracy while being more efficient than cross-encoders.
|
||||
|
||||
vLLM supports ColBERT models for reranking tasks, automatically applying MaxSim scoring for query-document relevance:
|
||||
|
||||
```shell
|
||||
vllm serve answerdotai/answerai-colbert-small-v1
|
||||
```
|
||||
|
||||
Currently supports ColBERT models with standard BERT encoders (e.g., `answerdotai/answerai-colbert-small-v1`, `colbert-ir/colbertv2.0`).
|
||||
|
||||
ColBERT models with modified encoder architectures are not yet supported, including BERT variants with rotary embeddings (e.g., `jinaai/jina-colbert-v2`) or other custom encoders (e.g., `LiquidAI/LFM2-ColBERT-350M`).
|
||||
|
||||
If your standard BERT ColBERT model's config doesn't specify the architecture as `HF_ColBERT`, override it with:
|
||||
|
||||
```shell
|
||||
vllm serve your-colbert-model --hf-overrides '{"architectures": ["HF_ColBERT"]}'
|
||||
```
|
||||
|
||||
Then you can use the rerank endpoint:
|
||||
|
||||
```shell
|
||||
curl -s http://localhost:8000/rerank -H "Content-Type: application/json" -d '{
|
||||
"model": "answerdotai/answerai-colbert-small-v1",
|
||||
"query": "What is machine learning?",
|
||||
"documents": [
|
||||
"Machine learning is a subset of artificial intelligence.",
|
||||
"Python is a programming language.",
|
||||
"Deep learning uses neural networks."
|
||||
]
|
||||
}'
|
||||
```
|
||||
|
||||
Or the score endpoint:
|
||||
|
||||
```shell
|
||||
curl -s http://localhost:8000/score -H "Content-Type: application/json" -d '{
|
||||
"model": "answerdotai/answerai-colbert-small-v1",
|
||||
"text_1": "What is machine learning?",
|
||||
"text_2": ["Machine learning is a subset of AI.", "The weather is sunny."]
|
||||
}'
|
||||
```
|
||||
|
||||
You can also get the raw token embeddings using the pooling endpoint with `token_embed` task:
|
||||
|
||||
```shell
|
||||
curl -s http://localhost:8000/pooling -H "Content-Type: application/json" -d '{
|
||||
"model": "answerdotai/answerai-colbert-small-v1",
|
||||
"input": "What is machine learning?",
|
||||
"task": "token_embed"
|
||||
}'
|
||||
```
|
||||
|
||||
An example can be found here: [examples/pooling/score/colbert_rerank_online.py](../../examples/pooling/score/colbert_rerank_online.py)
|
||||
|
||||
### BAAI/bge-m3
|
||||
|
||||
The `BAAI/bge-m3` model comes with extra weights for sparse and colbert embeddings but unfortunately in its `config.json`
|
||||
|
||||
Reference in New Issue
Block a user