The difference between the (sequence) embedding task and the token embedding task is that (sequence) embedding outputs one embedding for each sequence, while token embedding outputs a embedding for each token.
Many embedding models support both (sequence) embedding and token embedding. For further details on (sequence) embedding, please refer to [this page](embed.md).
Similarity scores can be computed using late interaction between two input prompts via the score API. For more information, see [Score API](scoring.md).
### Extract last hidden states
Models of any architecture can be converted into embedding models using `--convert embed`. Token embedding can then be used to extract the last hidden states from these models.
<sup>C</sup> Automatically converted into an embedding model via `--convert embed`. ([details](./README.md#model-conversion))
\* Feature support is the same as that of the original model.
If your model is not in the above list, we will try to automatically convert the model using [as_embedding_model][vllm.model_executor.models.adapters.as_embedding_model].
--8<-- [end:supported-token-embed-models]
## Offline Inference
### Pooling Parameters
The following [pooling parameters][vllm.PoolingParams] are supported.
(output,) = llm.encode("Hello, my name is", pooling_task="token_embed")
data = output.outputs.data
print(f"Data: {data!r}")
```
### `LLM.score`
The [score][vllm.LLM.score] method outputs similarity scores between sentence pairs.
All models that support token embedding task also support using the score API to compute similarity scores by calculating the late interaction of two input prompts.