Signed-off-by: Vineeta Tiwari <vineeta.tiwari2@ibm.com> Signed-off-by: Vineeta Tiwari <vineetatiwari2000@gmail.com> Co-authored-by: Vineeta Tiwari <vineeta.tiwari2@ibm.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
279 lines
9.3 KiB
Markdown
279 lines
9.3 KiB
Markdown
# Classification Usages
|
|
|
|
Classification involves predicting which predefined category, class, or label best corresponds to a given input.
|
|
|
|
## Summary
|
|
|
|
- Model Usage: (sequence) classification
|
|
- Pooling Task: `classify`
|
|
- Offline APIs:
|
|
- `LLM.classify(...)`
|
|
- `LLM.encode(..., pooling_task="classify")`
|
|
- Online APIs:
|
|
- [Classification API](classify.md#online-serving) (`/classify`)
|
|
- Pooling API (`/pooling`)
|
|
|
|
The key distinction between (sequence) classification and token classification lies in their output granularity: (sequence) classification produces a single result for an entire input sequence, whereas token classification yields a result for each individual token within the sequence.
|
|
|
|
Many classification models support both (sequence) classification and token classification. For further details on token classification, please refer to [this page](token_classify.md).
|
|
|
|
Only when a classification model outputs num_labels equal to 1 can it be used as a scoring model and have its scoring API enabled, please refer to [this page](scoring.md).
|
|
|
|
## Typical Use Cases
|
|
|
|
### Classification
|
|
|
|
The most fundamental application of classification models is to categorize input data into predefined classes.
|
|
|
|
## Supported Models
|
|
|
|
### Text-only Models
|
|
|
|
| Architecture | Models | Example HF Models | [LoRA](../../features/lora.md) | [PP](../../serving/parallelism_scaling.md) |
|
|
| ------------ | ------ | ----------------- | ------------------------------ | ------------------------------------------ |
|
|
| `ErnieForSequenceClassification` | BERT-like Chinese ERNIE | `Forrest20231206/ernie-3.0-base-zh-cls` | | |
|
|
| `GPT2ForSequenceClassification` | GPT2 | `nie3e/sentiment-polish-gpt2-small` | | |
|
|
| `Qwen2ForSequenceClassification`<sup>C</sup> | Qwen2-based | `jason9693/Qwen2.5-1.5B-apeach` | | |
|
|
| `*Model`<sup>C</sup>, `*ForCausalLM`<sup>C</sup>, etc. | Generative models | N/A | \* | \* |
|
|
|
|
### Multimodal Models
|
|
|
|
!!! note
|
|
For more information about multimodal models inputs, see [this page](../supported_models.md#list-of-multimodal-language-models).
|
|
|
|
| Architecture | Models | Inputs | Example HF Models | [LoRA](../../features/lora.md) | [PP](../../serving/parallelism_scaling.md) |
|
|
| ------------ | ------ | ------ | ----------------- | ------------------------------ | ------------------------------------------ |
|
|
| `Qwen2_5_VLForSequenceClassification`<sup>C</sup> | Qwen2_5_VL-based | T + I<sup>E+</sup> + V<sup>E+</sup> | `muziyongshixin/Qwen2.5-VL-7B-for-VideoCls` | | |
|
|
| `*ForConditionalGeneration`<sup>C</sup>, `*ForCausalLM`<sup>C</sup>, etc. | Generative models | \* | N/A | \* | \* |
|
|
|
|
<sup>C</sup> Automatically converted into a classification model via `--convert classify`. ([details](./README.md#model-conversion))
|
|
\* Feature support is the same as that of the original model.
|
|
|
|
If your model is not in the above list, we will try to automatically convert the model using
|
|
[as_seq_cls_model][vllm.model_executor.models.adapters.as_seq_cls_model]. By default, the class probabilities are extracted from the softmaxed hidden state corresponding to the last token.
|
|
|
|
### Cross-encoder Models
|
|
|
|
Cross-encoder (aka reranker) models are a subset of classification models that accept two prompts as input and output num_labels equal to 1. Most classification models can also be used as [cross-encoder models](scoring.md#cross-encoder-models). For more information on cross-encoder models, please refer to [this page](scoring.md).
|
|
|
|
--8<-- "docs/models/pooling_models/scoring.md:supported-cross-encoder-models"
|
|
|
|
### Reward Models
|
|
|
|
Using (sequence) classification models as reward models. For more information, see [Reward Models](reward.md).
|
|
|
|
--8<-- "docs/models/pooling_models/reward.md:supported-sequence-reward-models"
|
|
|
|
## Offline Inference
|
|
|
|
### Pooling Parameters
|
|
|
|
The following [pooling parameters][vllm.PoolingParams] are supported.
|
|
|
|
```python
|
|
--8<-- "vllm/pooling_params.py:common-pooling-params"
|
|
--8<-- "vllm/pooling_params.py:classify-pooling-params"
|
|
```
|
|
|
|
### `LLM.classify`
|
|
|
|
The [classify][vllm.LLM.classify] method outputs a probability vector for each prompt.
|
|
|
|
```python
|
|
from vllm import LLM
|
|
|
|
llm = LLM(model="jason9693/Qwen2.5-1.5B-apeach", runner="pooling")
|
|
(output,) = llm.classify("Hello, my name is")
|
|
|
|
probs = output.outputs.probs
|
|
print(f"Class Probabilities: {probs!r} (size={len(probs)})")
|
|
```
|
|
|
|
A code example can be found here: [examples/basic/offline_inference/classify.py](../../../examples/basic/offline_inference/classify.py)
|
|
|
|
### `LLM.encode`
|
|
|
|
The [encode][vllm.LLM.encode] method is available to all pooling models in vLLM.
|
|
|
|
Set `pooling_task="classify"` when using `LLM.encode` for classification Models:
|
|
|
|
```python
|
|
from vllm import LLM
|
|
|
|
llm = LLM(model="jason9693/Qwen2.5-1.5B-apeach", runner="pooling")
|
|
(output,) = llm.encode("Hello, my name is", pooling_task="classify")
|
|
|
|
data = output.outputs.data
|
|
print(f"Data: {data!r}")
|
|
```
|
|
|
|
## Online Serving
|
|
|
|
### Classification API
|
|
|
|
Online `/classify` API is similar to `LLM.classify`.
|
|
|
|
#### Completion Parameters
|
|
|
|
The following Classification API parameters are supported:
|
|
|
|
??? code
|
|
|
|
```python
|
|
--8<-- "vllm/entrypoints/pooling/base/protocol.py:pooling-common-params"
|
|
--8<-- "vllm/entrypoints/pooling/base/protocol.py:completion-params"
|
|
--8<-- "vllm/entrypoints/pooling/base/protocol.py:classify-params"
|
|
```
|
|
|
|
The following extra parameters are supported:
|
|
|
|
??? code
|
|
|
|
```python
|
|
--8<-- "vllm/entrypoints/pooling/base/protocol.py:pooling-common-extra-params"
|
|
--8<-- "vllm/entrypoints/pooling/base/protocol.py:completion-extra-params"
|
|
--8<-- "vllm/entrypoints/pooling/base/protocol.py:classify-extra-params"
|
|
```
|
|
|
|
#### Chat Parameters
|
|
|
|
For chat-like input (i.e. if `messages` is passed), the following parameters are supported:
|
|
|
|
??? code
|
|
|
|
```python
|
|
--8<-- "vllm/entrypoints/pooling/base/protocol.py:pooling-common-params"
|
|
--8<-- "vllm/entrypoints/pooling/base/protocol.py:chat-params"
|
|
--8<-- "vllm/entrypoints/pooling/base/protocol.py:classify-params"
|
|
```
|
|
|
|
these extra parameters are supported instead:
|
|
|
|
??? code
|
|
|
|
```python
|
|
--8<-- "vllm/entrypoints/pooling/base/protocol.py:pooling-common-extra-params"
|
|
--8<-- "vllm/entrypoints/pooling/base/protocol.py:chat-extra-params"
|
|
--8<-- "vllm/entrypoints/pooling/base/protocol.py:classify-extra-params"
|
|
```
|
|
|
|
#### Example Requests
|
|
|
|
Code example: [examples/pooling/classify/classification_online.py](../../../examples/pooling/classify/classification_online.py)
|
|
|
|
You can classify multiple texts by passing an array of strings:
|
|
|
|
```bash
|
|
curl -v "http://127.0.0.1:8000/classify" \
|
|
-H "Content-Type: application/json" \
|
|
-d '{
|
|
"model": "jason9693/Qwen2.5-1.5B-apeach",
|
|
"input": [
|
|
"Loved the new café—coffee was great.",
|
|
"This update broke everything. Frustrating."
|
|
]
|
|
}'
|
|
```
|
|
|
|
??? console "Response"
|
|
|
|
```json
|
|
{
|
|
"id": "classify-7c87cac407b749a6935d8c7ce2a8fba2",
|
|
"object": "list",
|
|
"created": 1745383065,
|
|
"model": "jason9693/Qwen2.5-1.5B-apeach",
|
|
"data": [
|
|
{
|
|
"index": 0,
|
|
"label": "Default",
|
|
"probs": [
|
|
0.565970778465271,
|
|
0.4340292513370514
|
|
],
|
|
"num_classes": 2
|
|
},
|
|
{
|
|
"index": 1,
|
|
"label": "Spoiled",
|
|
"probs": [
|
|
0.26448777318000793,
|
|
0.7355121970176697
|
|
],
|
|
"num_classes": 2
|
|
}
|
|
],
|
|
"usage": {
|
|
"prompt_tokens": 20,
|
|
"total_tokens": 20,
|
|
"completion_tokens": 0,
|
|
"prompt_tokens_details": null
|
|
}
|
|
}
|
|
```
|
|
|
|
You can also pass a string directly to the `input` field:
|
|
|
|
```bash
|
|
curl -v "http://127.0.0.1:8000/classify" \
|
|
-H "Content-Type: application/json" \
|
|
-d '{
|
|
"model": "jason9693/Qwen2.5-1.5B-apeach",
|
|
"input": "Loved the new café—coffee was great."
|
|
}'
|
|
```
|
|
|
|
??? console "Response"
|
|
|
|
```json
|
|
{
|
|
"id": "classify-9bf17f2847b046c7b2d5495f4b4f9682",
|
|
"object": "list",
|
|
"created": 1745383213,
|
|
"model": "jason9693/Qwen2.5-1.5B-apeach",
|
|
"data": [
|
|
{
|
|
"index": 0,
|
|
"label": "Default",
|
|
"probs": [
|
|
0.565970778465271,
|
|
0.4340292513370514
|
|
],
|
|
"num_classes": 2
|
|
}
|
|
],
|
|
"usage": {
|
|
"prompt_tokens": 10,
|
|
"total_tokens": 10,
|
|
"completion_tokens": 0,
|
|
"prompt_tokens_details": null
|
|
}
|
|
}
|
|
```
|
|
|
|
## More examples
|
|
|
|
More examples can be found here: [examples/pooling/classify](../../../examples/pooling/classify)
|
|
|
|
## Supported Features
|
|
|
|
### Enable/disable activation
|
|
|
|
You can enable or disable activation via `use_activation`.
|
|
|
|
### Problem type (e.g. `multi_label_classification`)
|
|
|
|
You can modify the `problem_type` via problem_type in the Hugging Face config. The supported problem types are: `single_label_classification`, `multi_label_classification`, and `regression`.
|
|
|
|
Implement alignment with transformers [ForSequenceClassificationLoss](https://github.com/huggingface/transformers/blob/57bb6db6ee4cfaccc45b8d474dfad5a17811ca60/src/transformers/loss/loss_utils.py#L92).
|
|
|
|
### Logit bias
|
|
|
|
You can modify the `logit_bias` (aka `sigmoid_normalize`) through the logit_bias parameter in `vllm.config.PoolerConfig`.
|
|
|
|
## Removed Features
|
|
|
|
### Remove softmax from PoolingParams
|
|
|
|
We have already removed `softmax` and `activation` from PoolingParams. Instead, use `use_activation`, since we allow `classify` and `token_classify` to use any activation function.
|