# Classification Usages Classification involves predicting which predefined category, class, or label best corresponds to a given input. ## Summary - Model Usage: (sequence) classification - Pooling Task: `classify` - Offline APIs: - `LLM.classify(...)` - `LLM.encode(..., pooling_task="classify")` - Online APIs: - [Classification API](classify.md#online-serving) (`/classify`) - Pooling API (`/pooling`) The key distinction between (sequence) classification and token classification lies in their output granularity: (sequence) classification produces a single result for an entire input sequence, whereas token classification yields a result for each individual token within the sequence. Many classification models support both (sequence) classification and token classification. For further details on token classification, please refer to [this page](token_classify.md). Only when a classification model outputs num_labels equal to 1 can it be used as a scoring model and have its scoring API enabled, please refer to [this page](scoring.md). ## Typical Use Cases ### Classification The most fundamental application of classification models is to categorize input data into predefined classes. ## Supported Models ### Text-only Models | Architecture | Models | Example HF Models | [LoRA](../../features/lora.md) | [PP](../../serving/parallelism_scaling.md) | | ------------ | ------ | ----------------- | ------------------------------ | ------------------------------------------ | | `ErnieForSequenceClassification` | BERT-like Chinese ERNIE | `Forrest20231206/ernie-3.0-base-zh-cls` | | | | `GPT2ForSequenceClassification` | GPT2 | `nie3e/sentiment-polish-gpt2-small` | | | | `Qwen2ForSequenceClassification`^C | Qwen2-based | `jason9693/Qwen2.5-1.5B-apeach` | | | | `*Model`^C, `*ForCausalLM`^C, etc. | Generative models | N/A | \* | \* | ### Multimodal Models !!! note For more information about multimodal models inputs, see [this page](../supported_models.md#list-of-multimodal-language-models). | Architecture | Models | Inputs | Example HF Models | [LoRA](../../features/lora.md) | [PP](../../serving/parallelism_scaling.md) | | ------------ | ------ | ------ | ----------------- | ------------------------------ | ------------------------------------------ | | `Qwen2_5_VLForSequenceClassification`^C | Qwen2_5_VL-based | T + I^E+ + V^E+ | `muziyongshixin/Qwen2.5-VL-7B-for-VideoCls` | | | | `*ForConditionalGeneration`^C, `*ForCausalLM`^C, etc. | Generative models | \* | N/A | \* | \* | ^C Automatically converted into a classification model via `--convert classify`. ([details](./README.md#model-conversion)) \* Feature support is the same as that of the original model. If your model is not in the above list, we will try to automatically convert the model using [as_seq_cls_model][vllm.model_executor.models.adapters.as_seq_cls_model]. By default, the class probabilities are extracted from the softmaxed hidden state corresponding to the last token. ### Cross-encoder Models Cross-encoder (aka reranker) models are a subset of classification models that accept two prompts as input and output num_labels equal to 1. Most classification models can also be used as [cross-encoder models](scoring.md#cross-encoder-models). For more information on cross-encoder models, please refer to [this page](scoring.md). --8<-- "docs/models/pooling_models/scoring.md:supported-cross-encoder-models" ### Reward Models Using (sequence) classification models as reward models. For more information, see [Reward Models](reward.md). --8<-- "docs/models/pooling_models/reward.md:supported-sequence-reward-models" ## Offline Inference ### Pooling Parameters The following [pooling parameters][vllm.PoolingParams] are supported. ```python --8<-- "vllm/pooling_params.py:common-pooling-params" --8<-- "vllm/pooling_params.py:classify-pooling-params" ``` ### `LLM.classify` The [classify][vllm.LLM.classify] method outputs a probability vector for each prompt. ```python from vllm import LLM llm = LLM(model="jason9693/Qwen2.5-1.5B-apeach", runner="pooling") (output,) = llm.classify("Hello, my name is") probs = output.outputs.probs print(f"Class Probabilities: {probs!r} (size={len(probs)})") ``` A code example can be found here: [examples/basic/offline_inference/classify.py](../../../examples/basic/offline_inference/classify.py) ### `LLM.encode` The [encode][vllm.LLM.encode] method is available to all pooling models in vLLM. Set `pooling_task="classify"` when using `LLM.encode` for classification Models: ```python from vllm import LLM llm = LLM(model="jason9693/Qwen2.5-1.5B-apeach", runner="pooling") (output,) = llm.encode("Hello, my name is", pooling_task="classify") data = output.outputs.data print(f"Data: {data!r}") ``` ## Online Serving ### Classification API Online `/classify` API is similar to `LLM.classify`. #### Completion Parameters The following Classification API parameters are supported: ??? code ```python --8<-- "vllm/entrypoints/pooling/base/protocol.py:pooling-common-params" --8<-- "vllm/entrypoints/pooling/base/protocol.py:completion-params" --8<-- "vllm/entrypoints/pooling/base/protocol.py:classify-params" ``` The following extra parameters are supported: ??? code ```python --8<-- "vllm/entrypoints/pooling/base/protocol.py:pooling-common-extra-params" --8<-- "vllm/entrypoints/pooling/base/protocol.py:completion-extra-params" --8<-- "vllm/entrypoints/pooling/base/protocol.py:classify-extra-params" ``` #### Chat Parameters For chat-like input (i.e. if `messages` is passed), the following parameters are supported: ??? code ```python --8<-- "vllm/entrypoints/pooling/base/protocol.py:pooling-common-params" --8<-- "vllm/entrypoints/pooling/base/protocol.py:chat-params" --8<-- "vllm/entrypoints/pooling/base/protocol.py:classify-params" ``` these extra parameters are supported instead: ??? code ```python --8<-- "vllm/entrypoints/pooling/base/protocol.py:pooling-common-extra-params" --8<-- "vllm/entrypoints/pooling/base/protocol.py:chat-extra-params" --8<-- "vllm/entrypoints/pooling/base/protocol.py:classify-extra-params" ``` #### Example Requests Code example: [examples/pooling/classify/classification_online.py](../../../examples/pooling/classify/classification_online.py) You can classify multiple texts by passing an array of strings: ```bash curl -v "http://127.0.0.1:8000/classify" \ -H "Content-Type: application/json" \ -d '{ "model": "jason9693/Qwen2.5-1.5B-apeach", "input": [ "Loved the new café—coffee was great.", "This update broke everything. Frustrating." ] }' ``` ??? console "Response" ```json { "id": "classify-7c87cac407b749a6935d8c7ce2a8fba2", "object": "list", "created": 1745383065, "model": "jason9693/Qwen2.5-1.5B-apeach", "data": [ { "index": 0, "label": "Default", "probs": [ 0.565970778465271, 0.4340292513370514 ], "num_classes": 2 }, { "index": 1, "label": "Spoiled", "probs": [ 0.26448777318000793, 0.7355121970176697 ], "num_classes": 2 } ], "usage": { "prompt_tokens": 20, "total_tokens": 20, "completion_tokens": 0, "prompt_tokens_details": null } } ``` You can also pass a string directly to the `input` field: ```bash curl -v "http://127.0.0.1:8000/classify" \ -H "Content-Type: application/json" \ -d '{ "model": "jason9693/Qwen2.5-1.5B-apeach", "input": "Loved the new café—coffee was great." }' ``` ??? console "Response" ```json { "id": "classify-9bf17f2847b046c7b2d5495f4b4f9682", "object": "list", "created": 1745383213, "model": "jason9693/Qwen2.5-1.5B-apeach", "data": [ { "index": 0, "label": "Default", "probs": [ 0.565970778465271, 0.4340292513370514 ], "num_classes": 2 } ], "usage": { "prompt_tokens": 10, "total_tokens": 10, "completion_tokens": 0, "prompt_tokens_details": null } } ``` ## More examples More examples can be found here: [examples/pooling/classify](../../../examples/pooling/classify) ## Supported Features ### Enable/disable activation You can enable or disable activation via `use_activation`. ### Problem type (e.g. `multi_label_classification`) You can modify the `problem_type` via problem_type in the Hugging Face config. The supported problem types are: `single_label_classification`, `multi_label_classification`, and `regression`. Implement alignment with transformers [ForSequenceClassificationLoss](https://github.com/huggingface/transformers/blob/57bb6db6ee4cfaccc45b8d474dfad5a17811ca60/src/transformers/loss/loss_utils.py#L92). ### Logit bias You can modify the `logit_bias` (aka `sigmoid_normalize`) through the logit_bias parameter in `vllm.config.PoolerConfig`. ## Removed Features ### Remove softmax from PoolingParams We have already removed `softmax` and `activation` from PoolingParams. Instead, use `use_activation`, since we allow `classify` and `token_classify` to use any activation function.