docs/models/pooling_models/classify.md

# Classification Usages

Classification involves predicting which predefined category, class, or label best corresponds to a given input.

## Summary

- Model Usage: (sequence) classification
- Pooling Task: `classify`
- Offline APIs:
    - `LLM.classify(...)`
    - `LLM.encode(..., pooling_task="classify")`
- Online APIs:
    - [Classification API](classify.md#online-serving) (`/classify`)
    - Pooling API (`/pooling`)

The key distinction between (sequence) classification and token classification lies in their output granularity: (sequence) classification produces a single result for an entire input sequence, whereas token classification yields a result for each individual token within the sequence.

Many classification models support both (sequence) classification and token classification. For further details on token classification, please refer to [this page](token_classify.md).

Only when a classification model outputs num_labels equal to 1 can it be used as a scoring model and have its scoring API enabled, please refer to [this page](scoring.md).

## Typical Use Cases

### Classification

The most fundamental application of classification models is to categorize input data into predefined classes.

## Supported Models

### Text-only Models

| Architecture | Models | Example HF Models | [LoRA](../../features/lora.md) | [PP](../../serving/parallelism_scaling.md) |
| ------------ | ------ | ----------------- | ------------------------------ | ------------------------------------------ |
| `ErnieForSequenceClassification` | BERT-like Chinese ERNIE | `Forrest20231206/ernie-3.0-base-zh-cls` | | |
| `GPT2ForSequenceClassification` | GPT2 | `nie3e/sentiment-polish-gpt2-small` | | |
| `Qwen2ForSequenceClassification`<sup>C</sup> | Qwen2-based | `jason9693/Qwen2.5-1.5B-apeach` | | |
| `*Model`<sup>C</sup>, `*ForCausalLM`<sup>C</sup>, etc. | Generative models | N/A | \* | \* |

### Multimodal Models

!!! note
    For more information about multimodal models inputs, see [this page](../supported_models.md#list-of-multimodal-language-models).

| Architecture | Models | Inputs | Example HF Models | [LoRA](../../features/lora.md) | [PP](../../serving/parallelism_scaling.md) |
| ------------ | ------ | ------ | ----------------- | ------------------------------ | ------------------------------------------ |
| `Qwen2_5_VLForSequenceClassification`<sup>C</sup> | Qwen2_5_VL-based | T + I<sup>E+</sup> + V<sup>E+</sup> | `muziyongshixin/Qwen2.5-VL-7B-for-VideoCls` | | |
| `*ForConditionalGeneration`<sup>C</sup>, `*ForCausalLM`<sup>C</sup>, etc. | Generative models | \* | N/A | \* | \* |

<sup>C</sup> Automatically converted into a classification model via `--convert classify`. ([details](./README.md#model-conversion))  
\* Feature support is the same as that of the original model.

If your model is not in the above list, we will try to automatically convert the model using
[as_seq_cls_model][vllm.model_executor.models.adapters.as_seq_cls_model]. By default, the class probabilities are extracted from the softmaxed hidden state corresponding to the last token.

### Cross-encoder Models

Cross-encoder (aka reranker) models are a subset of classification models that accept two prompts as input and output num_labels equal to 1. Most classification models can also be used as [cross-encoder models](scoring.md#cross-encoder-models). For more information on cross-encoder models, please refer to [this page](scoring.md).

--8<-- "docs/models/pooling_models/scoring.md:supported-cross-encoder-models"

### Reward Models

Using (sequence) classification models as reward models. For more information, see [Reward Models](reward.md).

--8<-- "docs/models/pooling_models/reward.md:supported-sequence-reward-models"

## Offline Inference

### Pooling Parameters

The following [pooling parameters][vllm.PoolingParams] are supported.

```python
--8<-- "vllm/pooling_params.py:common-pooling-params"
--8<-- "vllm/pooling_params.py:classify-pooling-params"
```

### `LLM.classify`

The [classify][vllm.LLM.classify] method outputs a probability vector for each prompt.

```python
from vllm import LLM

llm = LLM(model="jason9693/Qwen2.5-1.5B-apeach", runner="pooling")
(output,) = llm.classify("Hello, my name is")

probs = output.outputs.probs
print(f"Class Probabilities: {probs!r} (size={len(probs)})")
```

A code example can be found here: [examples/basic/offline_inference/classify.py](../../../examples/basic/offline_inference/classify.py)

### `LLM.encode`

The [encode][vllm.LLM.encode] method is available to all pooling models in vLLM.

Set `pooling_task="classify"` when using `LLM.encode` for classification Models:

```python
from vllm import LLM

llm = LLM(model="jason9693/Qwen2.5-1.5B-apeach", runner="pooling")
(output,) = llm.encode("Hello, my name is", pooling_task="classify")

data = output.outputs.data
print(f"Data: {data!r}")
```

## Online Serving

### Classification API

Online `/classify` API is similar to `LLM.classify`.

#### Completion Parameters

The following Classification API parameters are supported:

??? code

    ```python
    --8<-- "vllm/entrypoints/pooling/base/protocol.py:pooling-common-params"
    --8<-- "vllm/entrypoints/pooling/base/protocol.py:completion-params"
    --8<-- "vllm/entrypoints/pooling/base/protocol.py:classify-params"
    ```

The following extra parameters are supported:

??? code

    ```python
    --8<-- "vllm/entrypoints/pooling/base/protocol.py:pooling-common-extra-params"
    --8<-- "vllm/entrypoints/pooling/base/protocol.py:completion-extra-params"
    --8<-- "vllm/entrypoints/pooling/base/protocol.py:classify-extra-params"
    ```

#### Chat Parameters

For chat-like input (i.e. if `messages` is passed), the following parameters are supported:

??? code

    ```python
    --8<-- "vllm/entrypoints/pooling/base/protocol.py:pooling-common-params"
    --8<-- "vllm/entrypoints/pooling/base/protocol.py:chat-params"
    --8<-- "vllm/entrypoints/pooling/base/protocol.py:classify-params"
    ```

these extra parameters are supported instead:

??? code

    ```python
    --8<-- "vllm/entrypoints/pooling/base/protocol.py:pooling-common-extra-params"
    --8<-- "vllm/entrypoints/pooling/base/protocol.py:chat-extra-params"
    --8<-- "vllm/entrypoints/pooling/base/protocol.py:classify-extra-params"
    ```

#### Example Requests

Code example: [examples/pooling/classify/classification_online.py](../../../examples/pooling/classify/classification_online.py)

You can classify multiple texts by passing an array of strings:

```bash
curl -v "http://127.0.0.1:8000/classify" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "jason9693/Qwen2.5-1.5B-apeach",
    "input": [
      "Loved the new café—coffee was great.",
      "This update broke everything. Frustrating."
    ]
  }'
```

??? console "Response"

    ```json
    {
      "id": "classify-7c87cac407b749a6935d8c7ce2a8fba2",
      "object": "list",
      "created": 1745383065,
      "model": "jason9693/Qwen2.5-1.5B-apeach",
      "data": [
        {
          "index": 0,
          "label": "Default",
          "probs": [
            0.565970778465271,
            0.4340292513370514
          ],
          "num_classes": 2
        },
        {
          "index": 1,
          "label": "Spoiled",
          "probs": [
            0.26448777318000793,
            0.7355121970176697
          ],
          "num_classes": 2
        }
      ],
      "usage": {
        "prompt_tokens": 20,
        "total_tokens": 20,
        "completion_tokens": 0,
        "prompt_tokens_details": null
      }
    }
    ```

You can also pass a string directly to the `input` field:

```bash
curl -v "http://127.0.0.1:8000/classify" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "jason9693/Qwen2.5-1.5B-apeach",
    "input": "Loved the new café—coffee was great."
  }'
```

??? console "Response"

    ```json
    {
      "id": "classify-9bf17f2847b046c7b2d5495f4b4f9682",
      "object": "list",
      "created": 1745383213,
      "model": "jason9693/Qwen2.5-1.5B-apeach",
      "data": [
        {
          "index": 0,
          "label": "Default",
          "probs": [
            0.565970778465271,
            0.4340292513370514
          ],
          "num_classes": 2
        }
      ],
      "usage": {
        "prompt_tokens": 10,
        "total_tokens": 10,
        "completion_tokens": 0,
        "prompt_tokens_details": null
      }
    }
    ```

## More examples

More examples can be found here: [examples/pooling/classify](../../../examples/pooling/classify)

## Supported Features

### Enable/disable activation

You can enable or disable activation via `use_activation`.

### Problem type (e.g. `multi_label_classification`)

You can modify the `problem_type` via problem_type in the Hugging Face config. The supported problem types are: `single_label_classification`, `multi_label_classification`, and `regression`.

Implement alignment with transformers [ForSequenceClassificationLoss](https://github.com/huggingface/transformers/blob/57bb6db6ee4cfaccc45b8d474dfad5a17811ca60/src/transformers/loss/loss_utils.py#L92).

### Logit bias

You can modify the `logit_bias` (aka `sigmoid_normalize`) through the logit_bias parameter in `vllm.config.PoolerConfig`.

## Removed Features

### Remove softmax from PoolingParams

We have already removed `softmax` and `activation` from PoolingParams. Instead, use `use_activation`, since we allow `classify` and `token_classify` to use any activation function.
[Docs] Reorganize pooling docs. (#35592) Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io> Signed-off-by: wang.yuqi <noooop@126.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> 2026-03-19 19:25:47 +08:00			`# Classification Usages`

			`Classification involves predicting which predefined category, class, or label best corresponds to a given input.`

			`## Summary`

			`- Model Usage: (sequence) classification`
			- Pooling Task: `classify`
			`- Offline APIs:`
			- `LLM.classify(...)`
			- `LLM.encode(..., pooling_task="classify")`
			`- Online APIs:`
			- [Classification API](classify.md#online-serving) (`/classify`)
			- Pooling API (`/pooling`)

			`The key distinction between (sequence) classification and token classification lies in their output granularity: (sequence) classification produces a single result for an entire input sequence, whereas token classification yields a result for each individual token within the sequence.`

			`Many classification models support both (sequence) classification and token classification. For further details on token classification, please refer to [this page](token_classify.md).`

[Model] Deprecate the score task (this will not affect users). (#37537) Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io> 2026-03-20 16:07:56 +08:00			`Only when a classification model outputs num_labels equal to 1 can it be used as a scoring model and have its scoring API enabled, please refer to [this page](scoring.md).`

[Docs] Reorganize pooling docs. (#35592) Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io> Signed-off-by: wang.yuqi <noooop@126.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> 2026-03-19 19:25:47 +08:00			`## Typical Use Cases`

			`### Classification`

			`The most fundamental application of classification models is to categorize input data into predefined classes.`

			`## Supported Models`

			`### Text-only Models`

			`\| Architecture \| Models \| Example HF Models \| [LoRA](../../features/lora.md) \| [PP](../../serving/parallelism_scaling.md) \|`
			`\| ------------ \| ------ \| ----------------- \| ------------------------------ \| ------------------------------------------ \|`
			\| `ErnieForSequenceClassification` \| BERT-like Chinese ERNIE \| `Forrest20231206/ernie-3.0-base-zh-cls` \| \| \|
			\| `GPT2ForSequenceClassification` \| GPT2 \| `nie3e/sentiment-polish-gpt2-small` \| \| \|
			\| `Qwen2ForSequenceClassification`<sup>C</sup> \| Qwen2-based \| `jason9693/Qwen2.5-1.5B-apeach` \| \| \|
			\| `Model`<sup>C</sup>, `ForCausalLM`<sup>C</sup>, etc. \| Generative models \| N/A \| \* \| \* \|

			`### Multimodal Models`

			`!!! note`
			`For more information about multimodal models inputs, see [this page](../supported_models.md#list-of-multimodal-language-models).`

			`\| Architecture \| Models \| Inputs \| Example HF Models \| [LoRA](../../features/lora.md) \| [PP](../../serving/parallelism_scaling.md) \|`
			`\| ------------ \| ------ \| ------ \| ----------------- \| ------------------------------ \| ------------------------------------------ \|`
			\| `Qwen2_5_VLForSequenceClassification`<sup>C</sup> \| Qwen2_5_VL-based \| T + I<sup>E+</sup> + V<sup>E+</sup> \| `muziyongshixin/Qwen2.5-VL-7B-for-VideoCls` \| \| \|
			\| `ForConditionalGeneration`<sup>C</sup>, `ForCausalLM`<sup>C</sup>, etc. \| Generative models \| \* \| N/A \| \* \| \* \|

			<sup>C</sup> Automatically converted into a classification model via `--convert classify`. ([details](./README.md#model-conversion))
			`\* Feature support is the same as that of the original model.`

			`If your model is not in the above list, we will try to automatically convert the model using`
			`[as_seq_cls_model][vllm.model_executor.models.adapters.as_seq_cls_model]. By default, the class probabilities are extracted from the softmaxed hidden state corresponding to the last token.`

			`### Cross-encoder Models`

			`Cross-encoder (aka reranker) models are a subset of classification models that accept two prompts as input and output num_labels equal to 1. Most classification models can also be used as [cross-encoder models](scoring.md#cross-encoder-models). For more information on cross-encoder models, please refer to [this page](scoring.md).`

[Model] Deprecate the score task (this will not affect users). (#37537) Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io> 2026-03-20 16:07:56 +08:00			`--8<-- "docs/models/pooling_models/scoring.md:supported-cross-encoder-models"`
[Docs] Reorganize pooling docs. (#35592) Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io> Signed-off-by: wang.yuqi <noooop@126.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> 2026-03-19 19:25:47 +08:00
			`### Reward Models`

			`Using (sequence) classification models as reward models. For more information, see [Reward Models](reward.md).`

			`--8<-- "docs/models/pooling_models/reward.md:supported-sequence-reward-models"`

			`## Offline Inference`

			`### Pooling Parameters`

			`The following [pooling parameters][vllm.PoolingParams] are supported.`

			```python
			`--8<-- "vllm/pooling_params.py:common-pooling-params"`
			`--8<-- "vllm/pooling_params.py:classify-pooling-params"`
			```

			### `LLM.classify`

			`The [classify][vllm.LLM.classify] method outputs a probability vector for each prompt.`

			```python
			`from vllm import LLM`

			`llm = LLM(model="jason9693/Qwen2.5-1.5B-apeach", runner="pooling")`
			`(output,) = llm.classify("Hello, my name is")`

			`probs = output.outputs.probs`
			`print(f"Class Probabilities: {probs!r} (size={len(probs)})")`
			```

docs: fix broken offline inference paths in documentation (#37998) Signed-off-by: Vineeta Tiwari <vineeta.tiwari2@ibm.com> Signed-off-by: Vineeta Tiwari <vineetatiwari2000@gmail.com> Co-authored-by: Vineeta Tiwari <vineeta.tiwari2@ibm.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> 2026-03-24 23:05:14 +05:30			`A code example can be found here: [examples/basic/offline_inference/classify.py](../../../examples/basic/offline_inference/classify.py)`
[Docs] Reorganize pooling docs. (#35592) Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io> Signed-off-by: wang.yuqi <noooop@126.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> 2026-03-19 19:25:47 +08:00
			### `LLM.encode`

			`The [encode][vllm.LLM.encode] method is available to all pooling models in vLLM.`

			Set `pooling_task="classify"` when using `LLM.encode` for classification Models:

			```python
			`from vllm import LLM`

			`llm = LLM(model="jason9693/Qwen2.5-1.5B-apeach", runner="pooling")`
			`(output,) = llm.encode("Hello, my name is", pooling_task="classify")`

			`data = output.outputs.data`
			`print(f"Data: {data!r}")`
			```

			`## Online Serving`

			`### Classification API`

			Online `/classify` API is similar to `LLM.classify`.

			`#### Completion Parameters`

			`The following Classification API parameters are supported:`

			`??? code`

			```python
			`--8<-- "vllm/entrypoints/pooling/base/protocol.py:pooling-common-params"`
			`--8<-- "vllm/entrypoints/pooling/base/protocol.py:completion-params"`
			`--8<-- "vllm/entrypoints/pooling/base/protocol.py:classify-params"`
			```

			`The following extra parameters are supported:`

			`??? code`

			```python
			`--8<-- "vllm/entrypoints/pooling/base/protocol.py:pooling-common-extra-params"`
			`--8<-- "vllm/entrypoints/pooling/base/protocol.py:completion-extra-params"`
			`--8<-- "vllm/entrypoints/pooling/base/protocol.py:classify-extra-params"`
			```

			`#### Chat Parameters`

			For chat-like input (i.e. if `messages` is passed), the following parameters are supported:

			`??? code`

			```python
			`--8<-- "vllm/entrypoints/pooling/base/protocol.py:pooling-common-params"`
			`--8<-- "vllm/entrypoints/pooling/base/protocol.py:chat-params"`
			`--8<-- "vllm/entrypoints/pooling/base/protocol.py:classify-params"`
			```

			`these extra parameters are supported instead:`

			`??? code`

			```python
			`--8<-- "vllm/entrypoints/pooling/base/protocol.py:pooling-common-extra-params"`
			`--8<-- "vllm/entrypoints/pooling/base/protocol.py:chat-extra-params"`
			`--8<-- "vllm/entrypoints/pooling/base/protocol.py:classify-extra-params"`
			```

			`#### Example Requests`

			`Code example: [examples/pooling/classify/classification_online.py](../../../examples/pooling/classify/classification_online.py)`

			`You can classify multiple texts by passing an array of strings:`

			```bash
			`curl -v "http://127.0.0.1:8000/classify" \`
			`-H "Content-Type: application/json" \`
			`-d '{`
			`"model": "jason9693/Qwen2.5-1.5B-apeach",`
			`"input": [`
			`"Loved the new café—coffee was great.",`
			`"This update broke everything. Frustrating."`
			`]`
			`}'`
			```

			`??? console "Response"`

			```json
			`{`
			`"id": "classify-7c87cac407b749a6935d8c7ce2a8fba2",`
			`"object": "list",`
			`"created": 1745383065,`
			`"model": "jason9693/Qwen2.5-1.5B-apeach",`
			`"data": [`
			`{`
			`"index": 0,`
			`"label": "Default",`
			`"probs": [`
			`0.565970778465271,`
			`0.4340292513370514`
			`],`
			`"num_classes": 2`
			`},`
			`{`
			`"index": 1,`
			`"label": "Spoiled",`
			`"probs": [`
			`0.26448777318000793,`
			`0.7355121970176697`
			`],`
			`"num_classes": 2`
			`}`
			`],`
			`"usage": {`
			`"prompt_tokens": 20,`
			`"total_tokens": 20,`
			`"completion_tokens": 0,`
			`"prompt_tokens_details": null`
			`}`
			`}`
			```

			You can also pass a string directly to the `input` field:

			```bash
			`curl -v "http://127.0.0.1:8000/classify" \`
			`-H "Content-Type: application/json" \`
			`-d '{`
			`"model": "jason9693/Qwen2.5-1.5B-apeach",`
			`"input": "Loved the new café—coffee was great."`
			`}'`
			```

			`??? console "Response"`

			```json
			`{`
			`"id": "classify-9bf17f2847b046c7b2d5495f4b4f9682",`
			`"object": "list",`
			`"created": 1745383213,`
			`"model": "jason9693/Qwen2.5-1.5B-apeach",`
			`"data": [`
			`{`
			`"index": 0,`
			`"label": "Default",`
			`"probs": [`
			`0.565970778465271,`
			`0.4340292513370514`
			`],`
			`"num_classes": 2`
			`}`
			`],`
			`"usage": {`
			`"prompt_tokens": 10,`
			`"total_tokens": 10,`
			`"completion_tokens": 0,`
			`"prompt_tokens_details": null`
			`}`
			`}`
			```

			`## More examples`

			`More examples can be found here: [examples/pooling/classify](../../../examples/pooling/classify)`

			`## Supported Features`

			`### Enable/disable activation`

			You can enable or disable activation via `use_activation`.

			### Problem type (e.g. `multi_label_classification`)

			You can modify the `problem_type` via problem_type in the Hugging Face config. The supported problem types are: `single_label_classification`, `multi_label_classification`, and `regression`.

			`Implement alignment with transformers [ForSequenceClassificationLoss](https://github.com/huggingface/transformers/blob/57bb6db6ee4cfaccc45b8d474dfad5a17811ca60/src/transformers/loss/loss_utils.py#L92).`

			`### Logit bias`

			You can modify the `logit_bias` (aka `sigmoid_normalize`) through the logit_bias parameter in `vllm.config.PoolerConfig`.

			`## Removed Features`

			`### Remove softmax from PoolingParams`

			We have already removed `softmax` and `activation` from PoolingParams. Instead, use `use_activation`, since we allow `classify` and `token_classify` to use any activation function.