examples/pooling/score/colbert_rerank_online.py

# SPDX-License-Identifier: Apache-2.0
# SPDX-FileCopyrightText: Copyright contributors to the vLLM project
"""
Example of using ColBERT late interaction models for reranking and scoring.

ColBERT (Contextualized Late Interaction over BERT) uses per-token embeddings
and MaxSim scoring for document reranking, providing better accuracy than
single-vector models while being more efficient than cross-encoders.

vLLM supports ColBERT with multiple encoder backbones. Start the server
with one of the following:

    # BERT backbone (works out of the box)
    vllm serve answerdotai/answerai-colbert-small-v1

    # ModernBERT backbone
    vllm serve lightonai/GTE-ModernColBERT-v1 \
        --hf-overrides '{"architectures": ["ColBERTModernBertModel"]}'

    # Jina XLM-RoBERTa backbone
    vllm serve jinaai/jina-colbert-v2 \
        --hf-overrides '{"architectures": ["ColBERTJinaRobertaModel"]}' \
        --trust-remote-code

Then run this script:
    python colbert_rerank_online.py
"""

import json

import requests

# Change this to match the model you started the server with
MODEL = "answerdotai/answerai-colbert-small-v1"
BASE_URL = "http://127.0.0.1:8000"

headers = {"accept": "application/json", "Content-Type": "application/json"}

documents = [
    "Machine learning is a subset of artificial intelligence.",
    "Python is a programming language.",
    "Deep learning uses neural networks for complex tasks.",
    "The weather today is sunny.",
]


def rerank_example():
    """Use the /rerank endpoint to rank documents by query relevance."""
    print("=== Rerank Example ===")

    data = {
        "model": MODEL,
        "query": "What is machine learning?",
        "documents": documents,
    }

    response = requests.post(f"{BASE_URL}/rerank", headers=headers, json=data)
    result = response.json()
    print(json.dumps(result, indent=2))

    print("\nRanked documents (most relevant first):")
    for item in result["results"]:
        doc_idx = item["index"]
        score = item["relevance_score"]
        print(f"  Score {score:.4f}: {documents[doc_idx]}")


def score_example():
    """Use the /score endpoint for pairwise query-document scoring."""
    print("\n=== Score Example ===")

    data = {
        "model": MODEL,
        "text_1": "What is machine learning?",
        "text_2": [
            "Machine learning is a subset of AI.",
            "The weather is sunny.",
        ],
    }

    response = requests.post(f"{BASE_URL}/score", headers=headers, json=data)
    result = response.json()
    print(json.dumps(result, indent=2))


def main():
    rerank_example()
    score_example()


if __name__ == "__main__":
    main()
feat: Add ColBERT late interaction model support (#33686) Signed-off-by: Ilya Boytsov <ilyaboytsov1805@gmail.com> Signed-off-by: Ilya Boytsov <boytsovpanamera@mail.ru> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: wang.yuqi <yuqi.wang@daocloud.io> 2026-02-05 01:05:13 +01:00			`# SPDX-License-Identifier: Apache-2.0`
			`# SPDX-FileCopyrightText: Copyright contributors to the vLLM project`
			`"""`
Extend ColBERT support to non-standard BERT backbones (#34170) Signed-off-by: Ilya Boytsov <ilya.boytsov@aleph-alpha.com> 2026-02-13 10:53:09 +01:00			`Example of using ColBERT late interaction models for reranking and scoring.`
feat: Add ColBERT late interaction model support (#33686) Signed-off-by: Ilya Boytsov <ilyaboytsov1805@gmail.com> Signed-off-by: Ilya Boytsov <boytsovpanamera@mail.ru> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: wang.yuqi <yuqi.wang@daocloud.io> 2026-02-05 01:05:13 +01:00
			`ColBERT (Contextualized Late Interaction over BERT) uses per-token embeddings`
			`and MaxSim scoring for document reranking, providing better accuracy than`
			`single-vector models while being more efficient than cross-encoders.`

Extend ColBERT support to non-standard BERT backbones (#34170) Signed-off-by: Ilya Boytsov <ilya.boytsov@aleph-alpha.com> 2026-02-13 10:53:09 +01:00			`vLLM supports ColBERT with multiple encoder backbones. Start the server`
			`with one of the following:`

			`# BERT backbone (works out of the box)`
feat: Add ColBERT late interaction model support (#33686) Signed-off-by: Ilya Boytsov <ilyaboytsov1805@gmail.com> Signed-off-by: Ilya Boytsov <boytsovpanamera@mail.ru> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: wang.yuqi <yuqi.wang@daocloud.io> 2026-02-05 01:05:13 +01:00			`vllm serve answerdotai/answerai-colbert-small-v1`

Extend ColBERT support to non-standard BERT backbones (#34170) Signed-off-by: Ilya Boytsov <ilya.boytsov@aleph-alpha.com> 2026-02-13 10:53:09 +01:00			`# ModernBERT backbone`
			`vllm serve lightonai/GTE-ModernColBERT-v1 \`
			`--hf-overrides '{"architectures": ["ColBERTModernBertModel"]}'`

			`# Jina XLM-RoBERTa backbone`
			`vllm serve jinaai/jina-colbert-v2 \`
			`--hf-overrides '{"architectures": ["ColBERTJinaRobertaModel"]}' \`
			`--trust-remote-code`

feat: Add ColBERT late interaction model support (#33686) Signed-off-by: Ilya Boytsov <ilyaboytsov1805@gmail.com> Signed-off-by: Ilya Boytsov <boytsovpanamera@mail.ru> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: wang.yuqi <yuqi.wang@daocloud.io> 2026-02-05 01:05:13 +01:00			`Then run this script:`
			`python colbert_rerank_online.py`
			`"""`

			`import json`

			`import requests`

Extend ColBERT support to non-standard BERT backbones (#34170) Signed-off-by: Ilya Boytsov <ilya.boytsov@aleph-alpha.com> 2026-02-13 10:53:09 +01:00			`# Change this to match the model you started the server with`
			`MODEL = "answerdotai/answerai-colbert-small-v1"`
			`BASE_URL = "http://127.0.0.1:8000"`
feat: Add ColBERT late interaction model support (#33686) Signed-off-by: Ilya Boytsov <ilyaboytsov1805@gmail.com> Signed-off-by: Ilya Boytsov <boytsovpanamera@mail.ru> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: wang.yuqi <yuqi.wang@daocloud.io> 2026-02-05 01:05:13 +01:00
			`headers = {"accept": "application/json", "Content-Type": "application/json"}`

Extend ColBERT support to non-standard BERT backbones (#34170) Signed-off-by: Ilya Boytsov <ilya.boytsov@aleph-alpha.com> 2026-02-13 10:53:09 +01:00			`documents = [`
			`"Machine learning is a subset of artificial intelligence.",`
			`"Python is a programming language.",`
			`"Deep learning uses neural networks for complex tasks.",`
			`"The weather today is sunny.",`
			`]`


			`def rerank_example():`
			`"""Use the /rerank endpoint to rank documents by query relevance."""`
			`print("=== Rerank Example ===")`

			`data = {`
			`"model": MODEL,`
			`"query": "What is machine learning?",`
			`"documents": documents,`
			`}`

			`response = requests.post(f"{BASE_URL}/rerank", headers=headers, json=data)`
			`result = response.json()`
			`print(json.dumps(result, indent=2))`

			`print("\nRanked documents (most relevant first):")`
			`for item in result["results"]:`
			`doc_idx = item["index"]`
			`score = item["relevance_score"]`
			`print(f" Score {score:.4f}: {documents[doc_idx]}")`


			`def score_example():`
			`"""Use the /score endpoint for pairwise query-document scoring."""`
			`print("\n=== Score Example ===")`

			`data = {`
			`"model": MODEL,`
			`"text_1": "What is machine learning?",`
			`"text_2": [`
			`"Machine learning is a subset of AI.",`
			`"The weather is sunny.",`
			`],`
			`}`

			`response = requests.post(f"{BASE_URL}/score", headers=headers, json=data)`
			`result = response.json()`
			`print(json.dumps(result, indent=2))`
feat: Add ColBERT late interaction model support (#33686) Signed-off-by: Ilya Boytsov <ilyaboytsov1805@gmail.com> Signed-off-by: Ilya Boytsov <boytsovpanamera@mail.ru> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: wang.yuqi <yuqi.wang@daocloud.io> 2026-02-05 01:05:13 +01:00

			`def main():`
Extend ColBERT support to non-standard BERT backbones (#34170) Signed-off-by: Ilya Boytsov <ilya.boytsov@aleph-alpha.com> 2026-02-13 10:53:09 +01:00			`rerank_example()`
			`score_example()`
feat: Add ColBERT late interaction model support (#33686) Signed-off-by: Ilya Boytsov <ilyaboytsov1805@gmail.com> Signed-off-by: Ilya Boytsov <boytsovpanamera@mail.ru> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: wang.yuqi <yuqi.wang@daocloud.io> 2026-02-05 01:05:13 +01:00

			`if __name__ == "__main__":`
			`main()`