[Model] Deprecate the score task (this will not affect users). (#37537)

Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
2026-03-20 16:07:56 +08:00
parent dcee9be95a
commit ed359c497a
22 changed files with 184 additions and 163 deletions
--- a/docs/models/pooling_models/README.md
+++ b/docs/models/pooling_models/README.md
@@ -31,28 +31,29 @@ Of course, we also have "plugin" tasks that allow users to customize input and o

 ### Pooling Tasks

-| Pooling Tasks      | Granularity   | Outputs                                         |
-|--------------------|---------------|-------------------------------------------------|
-| `classify`         | Sequence-wise | probability vector of classes for each sequence |
-| `score` (see note) | Sequence-wise | reranker score for each sequence                |
-| `embed`            | Sequence-wise | vector representations for each sequence        |
-| `token_classify`   | Token-wise    | probability vector of classes for each token    |
-| `token_embed`      | Token-wise    | vector representations for each token           |
+| Pooling Tasks         | Granularity   | Outputs                                         |
+|-----------------------|---------------|-------------------------------------------------|
+| `classify` (see note) | Sequence-wise | probability vector of classes for each sequence |
+| `embed`               | Sequence-wise | vector representations for each sequence        |
+| `token_classify`      | Token-wise    | probability vector of classes for each token    |
+| `token_embed`         | Token-wise    | vector representations for each token           |

 !!! note
    Within classification tasks, there is a specialized subcategory: Cross-encoder (aka reranker) models. These models are a subset of classification models that accept two prompts as input and output num_labels equal to 1.

 ### Score Types

-| Pooling Tasks      | Granularity   | Outputs                                         | Score Types        | scoring function         |
-|--------------------|---------------|-------------------------------------------------|--------------------|--------------------------|
-| `classify`         | Sequence-wise | probability vector of classes for each sequence | nan                | nan                      |
-| `score` (see note) | Sequence-wise | reranker score for each sequence                | `cross-encoder`    | linear classifier        |
-| `embed`            | Sequence-wise | vector representations for each sequence        | `bi-encoder`       | cosine similarity        |
-| `token_classify`   | Token-wise    | probability vector of classes for each token    | nan                | nan                      |
-| `token_embed`      | Token-wise    | vector representations for each token           | `late-interaction` | late interaction(MaxSim) |
+The scoring models is designed to compute similarity scores between two input prompts. It supports three model types (aka `score_type`): `cross-encoder`, `late-interaction`, and `bi-encoder`.

-The score models is designed to compute similarity scores between two input prompts. It supports three model types (aka `score_type`): `cross-encoder`, `late-interaction`, and `bi-encoder`.
+| Pooling Tasks         | Granularity   | Outputs                                      | Score Types        | scoring function         |
+|-----------------------|---------------|----------------------------------------------|--------------------|--------------------------|
+| `classify` (see note) | Sequence-wise | reranker score for each sequence             | `cross-encoder`    | linear classifier        |
+| `embed`               | Sequence-wise | vector representations for each sequence     | `bi-encoder`       | cosine similarity        |
+| `token_classify`      | Token-wise    | probability vector of classes for each token | nan                | nan                      |
+| `token_embed`         | Token-wise    | vector representations for each token        | `late-interaction` | late interaction(MaxSim) |
+
+!!! note
+    Only when a classification model outputs num_labels equal to 1 can it be used as a scoring model and have its scoring API enabled.

 ### Pooling Usages

@@ -85,14 +86,16 @@ enabling the corresponding APIs.

 ### Offline APIs corresponding to pooling tasks

-| Task             | APIs                                                                       |
-|------------------|----------------------------------------------------------------------------|
-| `embed`          | `LLM.embed(...)`,`LLM.encode(..., pooling_task="embed")`, `LLM.score(...)` |
-| `classify`       | `LLM.classify(...)`, `LLM.encode(..., pooling_task="classify")`            |
-| `score`          | `LLM.score(...)`                                                           |
-| `token_classify` | `LLM.reward(...)`, `LLM.encode(..., pooling_task="token_classify")`        |
-| `token_embed`    | `LLM.encode(..., pooling_task="token_embed")`, `LLM.score(...)`            |
-| `plugin`         | `LLM.encode(..., pooling_task="plugin")`                                   |
+| Task             | APIs                                                                                  |
+|------------------|---------------------------------------------------------------------------------------|
+| `embed`          | `LLM.embed(...)`, `LLM.encode(..., pooling_task="embed")`, `LLM.score(...)`(see note) |
+| `classify`       | `LLM.classify(...)`, `LLM.encode(..., pooling_task="classify")`, `LLM.score(...)`     |
+| `token_classify` | `LLM.reward(...)`, `LLM.encode(..., pooling_task="token_classify")`                   |
+| `token_embed`    | `LLM.encode(..., pooling_task="token_embed")`, `LLM.score(...)`                       |
+| `plugin`         | `LLM.encode(..., pooling_task="plugin")`                                              |
+
+!!! note
+    Only when a classification model outputs num_labels equal to 1 can it be used as a scoring model and have its scoring API enabled.

 ### `LLM.classify`

@@ -206,11 +209,11 @@ If `--runner pooling` has been set (manually or automatically) but the model doe
 vLLM will attempt to automatically convert the model according to the architecture names
 shown in the table below.

-| Architecture                                    | `--convert` | Supported pooling tasks               |
-| ----------------------------------------------- | ----------- | ------------------------------------- |
-| `*ForTextEncoding`, `*EmbeddingModel`, `*Model` | `embed`     | `token_embed`, `embed`                |
-| `*ForRewardModeling`, `*RewardModel`            | `embed`     | `token_embed`, `embed`                |
-| `*For*Classification`, `*ClassificationModel`   | `classify`  | `token_classify`, `classify`, `score` |
+| Architecture                                    | `--convert` | Supported pooling tasks      |
+|-------------------------------------------------|-------------|------------------------------|
+| `*ForTextEncoding`, `*EmbeddingModel`, `*Model` | `embed`     | `token_embed`, `embed`       |
+| `*ForRewardModeling`, `*RewardModel`            | `embed`     | `token_embed`, `embed`       |
+| `*For*Classification`, `*ClassificationModel`   | `classify`  | `token_classify`, `classify` |

 !!! tip
    You can explicitly set `--convert <type>` to specify how to convert the model.
@@ -251,3 +254,7 @@ Pooling models now default support all pooling, you can use it without any setti

 - Extracting hidden states prefers using `token_embed` task.
 - Named Entity Recognition (NER) and reward models prefers using `token_classify` task.
+
+### Score task
+
+`score` task is deprecated and will be removed in v0.20. Please use `classify` instead. Only when a classification model outputs num_labels equal to 1 can it be used as a scoring model and have its scoring API enabled.