5.3 KiB
Token Classification Usages
Summary
- Model Usage: token classification
- Pooling Tasks:
token_classify - Offline APIs:
LLM.encode(..., pooling_task="token_classify")
- Online APIs:
- Pooling API (
/pooling)
- Pooling API (
The key distinction between (sequence) classification and token classification lies in their output granularity: (sequence) classification produces a single result for an entire input sequence, whereas token classification yields a result for each individual token within the sequence.
Many classification models support both (sequence) classification and token classification. For further details on (sequence) classification, please refer to this page.
!!! note
Pooling multitask support is deprecated and will be removed in v0.20. When the default pooling task (classify) is not
what you want, you need to manually specify it via `PoolerConfig(task="token_classify")` offline or
`--pooler-config.task token_classify` online.
Typical Use Cases
Named Entity Recognition (NER)
For implementation examples, see:
Offline: examples/pooling/token_classify/ner_offline.py
Online: examples/pooling/token_classify/ner_online.py
Forced Alignment
Forced alignment takes audio and reference text as input and produces word-level timestamps.
Offline: examples/pooling/token_classify/forced_alignment_offline.py
Sparse retrieval (lexical matching)
The BAAI/bge-m3 model leverages token classification for sparse retrieval. For more information, see this page.
Supported Models
| Architecture | Models | Example HF Models | LoRA | PP |
|---|---|---|---|---|
BertForTokenClassification |
bert-based | boltuix/NeuroBERT-NER (see note), etc. |
||
ErnieForTokenClassification |
BERT-like Chinese ERNIE | gyr66/Ernie-3.0-base-chinese-finetuned-ner |
||
ModernBertForTokenClassification |
ModernBERT-based | disham993/electrical-ner-ModernBERT-base |
||
Qwen3ForTokenClassificationC |
Qwen3-based | bd2lcco/Qwen3-0.6B-finetuned |
||
*ModelC, *ForCausalLMC, etc. |
Generative models | N/A | * | * |
C Automatically converted into a classification model via --convert classify. (details)
* Feature support is the same as that of the original model.
If your model is not in the above list, we will try to automatically convert the model using [as_seq_cls_model][vllm.model_executor.models.adapters.as_seq_cls_model]. By default, the class probabilities are extracted from the softmaxed hidden state corresponding to the last token.
Multimodal Models
!!! note For more information about multimodal models inputs, see this page.
| Architecture | Models | Inputs | Example HF Models | LoRA | PP |
|---|---|---|---|---|---|
Qwen3ASRForcedAlignerForTokenClassification |
Qwen3-ForcedAligner | T + A+ | Qwen/Qwen3-ForcedAligner-0.6B (see note) |
✅︎ |
!!! note
Forced alignment usage requires --hf-overrides '{"architectures": ["Qwen3ASRForcedAlignerForTokenClassification"]}'.
Please refer to examples/pooling/token_classify/forced_alignment_offline.py.
As Reward Models
Using token classification models as reward models. For details on reward models, see Reward Models.
--8<-- "docs/models/pooling_models/reward.md:supported-token-reward-models"
Offline Inference
Pooling Parameters
The following [pooling parameters][vllm.PoolingParams] are supported.
--8<-- "vllm/pooling_params.py:common-pooling-params"
--8<-- "vllm/pooling_params.py:classify-pooling-params"
LLM.encode
The [encode][vllm.LLM.encode] method is available to all pooling models in vLLM.
Set pooling_task="token_classify" when using LLM.encode for token classification Models:
from vllm import LLM
llm = LLM(model="boltuix/NeuroBERT-NER", runner="pooling")
(output,) = llm.encode("Hello, my name is", pooling_task="token_classify")
data = output.outputs.data
print(f"Data: {data!r}")
Online Serving
Please refer to the pooling API and use "task":"token_classify".
More examples
More examples can be found here: examples/pooling/token_classify
Supported Features
Token classification features should be consistent with (sequence) classification. For more information, see this page.