Files

haosdent d39b8daf5f [Feature] Add Qwen3-ForcedAligner support via token classification pooling (#35367 )

Signed-off-by: haosdent <haosdent@gmail.com>

2026-03-29 00:27:52 +00:00

5.3 KiB

Raw Blame History

Token Classification Usages

Summary

Model Usage: token classification
Pooling Tasks: token_classify
Offline APIs:
- LLM.encode(..., pooling_task="token_classify")
Online APIs:
- Pooling API (/pooling)

The key distinction between (sequence) classification and token classification lies in their output granularity: (sequence) classification produces a single result for an entire input sequence, whereas token classification yields a result for each individual token within the sequence.

Many classification models support both (sequence) classification and token classification. For further details on (sequence) classification, please refer to this page.

!!! note

Pooling multitask support is deprecated and will be removed in v0.20. When the default pooling task (classify) is not
what you want, you need to manually specify it via `PoolerConfig(task="token_classify")` offline or
`--pooler-config.task token_classify` online.

Typical Use Cases

Named Entity Recognition (NER)

For implementation examples, see:

Offline: examples/pooling/token_classify/ner_offline.py

Online: examples/pooling/token_classify/ner_online.py

Forced Alignment

Forced alignment takes audio and reference text as input and produces word-level timestamps.

Offline: examples/pooling/token_classify/forced_alignment_offline.py

Sparse retrieval (lexical matching)

The BAAI/bge-m3 model leverages token classification for sparse retrieval. For more information, see this page.

Supported Models

Architecture	Models	Example HF Models	LoRA	PP
`BertForTokenClassification`	bert-based	`boltuix/NeuroBERT-NER` (see note), etc.
`ErnieForTokenClassification`	BERT-like Chinese ERNIE	`gyr66/Ernie-3.0-base-chinese-finetuned-ner`
`ModernBertForTokenClassification`	ModernBERT-based	`disham993/electrical-ner-ModernBERT-base`
`Qwen3ForTokenClassification`^C	Qwen3-based	`bd2lcco/Qwen3-0.6B-finetuned`
`Model`^C, `ForCausalLM`^C, etc.	Generative models	N/A	*	*

^C Automatically converted into a classification model via --convert classify. (details) * Feature support is the same as that of the original model.

If your model is not in the above list, we will try to automatically convert the model using [as_seq_cls_model][vllm.model_executor.models.adapters.as_seq_cls_model]. By default, the class probabilities are extracted from the softmaxed hidden state corresponding to the last token.

Multimodal Models

!!! note For more information about multimodal models inputs, see this page.

Architecture	Models	Inputs	Example HF Models	LoRA	PP
`Qwen3ASRForcedAlignerForTokenClassification`	Qwen3-ForcedAligner	T + A⁺	`Qwen/Qwen3-ForcedAligner-0.6B` (see note)		✅︎

!!! note Forced alignment usage requires --hf-overrides '{"architectures": ["Qwen3ASRForcedAlignerForTokenClassification"]}'. Please refer to examples/pooling/token_classify/forced_alignment_offline.py.

As Reward Models

Using token classification models as reward models. For details on reward models, see Reward Models.

--8<-- "docs/models/pooling_models/reward.md:supported-token-reward-models"

Offline Inference

Pooling Parameters

The following [pooling parameters][vllm.PoolingParams] are supported.

--8<-- "vllm/pooling_params.py:common-pooling-params"
--8<-- "vllm/pooling_params.py:classify-pooling-params"

`LLM.encode`

The [encode][vllm.LLM.encode] method is available to all pooling models in vLLM.

Set pooling_task="token_classify" when using LLM.encode for token classification Models:

from vllm import LLM

llm = LLM(model="boltuix/NeuroBERT-NER", runner="pooling")
(output,) = llm.encode("Hello, my name is", pooling_task="token_classify")

data = output.outputs.data
print(f"Data: {data!r}")

Online Serving

Please refer to the pooling API and use "task":"token_classify".

More examples

More examples can be found here: examples/pooling/token_classify

Supported Features

Token classification features should be consistent with (sequence) classification. For more information, see this page.

5.3 KiB Raw Blame History Unescape Escape