2025-09-08 14:45:14 +01:00
# Features
## Compatibility Matrix
2025-05-23 11:09:53 +02:00
The tables below show mutually exclusive features and the support on some hardware.
The symbols used have the following meanings:
- ✅ = Full compatibility
- 🟠 = Partial compatibility
- ❌ = No compatibility
2025-05-29 17:51:20 +08:00
- ❔ = Unknown or TBD
2025-05-23 11:09:53 +02:00
!!! note
Check the ❌ or 🟠 with links to see tracking issue for unsupported feature/hardware combination.
2025-09-08 14:45:14 +01:00
### Feature x Feature
2025-05-23 11:09:53 +02:00
<style>
td:not(:first-child) {
text-align: center !important;
}
td {
padding: 0.5rem !important;
white-space: nowrap;
}
th {
padding: 0.5rem !important;
min-width: 0 !important;
}
th:not(:first-child) {
writing-mode: vertical-lr;
transform: rotate(180deg)
}
</style>
2025-10-17 10:22:06 +01:00
| Feature | [CP ](../configuration/optimization.md#chunked-prefill ) | [APC ](automatic_prefix_caching.md ) | [LoRA ](lora.md ) | [SD ](spec_decode.md ) | CUDA graph | [pooling ](../models/pooling_models.md ) | <abbr title="Encoder-Decoder Models">enc-dec</abbr> | <abbr title="Logprobs">logP</abbr> | <abbr title="Prompt Logprobs">prmpt logP</abbr> | <abbr title="Async Output Processing">async output</abbr> | multi-step | <abbr title="Multimodal Inputs">mm</abbr> | best-of | beam-search | [prompt-embeds ](prompt_embeds.md ) |
2025-09-19 19:46:23 -05:00
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2025-10-17 10:22:06 +01:00
| [CP ](../configuration/optimization.md#chunked-prefill ) | ✅ | | | | | | | | | | | | | | |
2025-09-19 19:46:23 -05:00
| [APC ](automatic_prefix_caching.md ) | ✅ | ✅ | | | | | | | | | | | | | |
| [LoRA ](lora.md ) | ✅ | ✅ | ✅ | | | | | | | | | | | | |
| [SD ](spec_decode.md ) | ✅ | ✅ | ❌ | ✅ | | | | | | | | | | | |
| CUDA graph | ✅ | ✅ | ✅ | ✅ | ✅ | | | | | | | | | | |
| [pooling ](../models/pooling_models.md ) | 🟠\* | 🟠\* | ✅ | ❌ | ✅ | ✅ | | | | | | | | | |
2025-10-17 04:05:34 +01:00
| <abbr title="Encoder-Decoder Models">enc-dec</abbr> | ❌ | [❌ ](https://github.com/vllm-project/vllm/issues/7366 ) | ❌ | [❌ ](https://github.com/vllm-project/vllm/issues/7366 ) | ✅ | ✅ | ✅ | | | | | | | | |
2025-09-19 19:46:23 -05:00
| <abbr title="Logprobs">logP</abbr> | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ✅ | ✅ | | | | | | | |
| <abbr title="Prompt Logprobs">prmpt logP</abbr> | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ✅ | ✅ | ✅ | | | | | | |
| <abbr title="Async Output Processing">async output</abbr> | ✅ | ✅ | ✅ | ❌ | ✅ | ❌ | ❌ | ✅ | ✅ | ✅ | | | | | |
| multi-step | ❌ | ✅ | ❌ | ❌ | ✅ | ❌ | ❌ | ✅ | ✅ | ✅ | ✅ | | | | |
2025-10-17 04:05:34 +01:00
| [mm ](multimodal_inputs.md ) | ✅ | ✅ | [🟠 ](https://github.com/vllm-project/vllm/pull/4194 )<sup>^</sup> | ❔ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❔ | ✅ | | | |
| best-of | ✅ | ✅ | ✅ | [❌ ](https://github.com/vllm-project/vllm/issues/6137 ) | ✅ | ❌ | ✅ | ✅ | ✅ | ❔ | [❌ ](https://github.com/vllm-project/vllm/issues/7968 ) | ✅ | ✅ | | |
| beam-search | ✅ | ✅ | ✅ | [❌ ](https://github.com/vllm-project/vllm/issues/6137 ) | ✅ | ❌ | ✅ | ✅ | ✅ | ❔ | [❌ ](https://github.com/vllm-project/vllm/issues/7968 ) | ❔ | ✅ | ✅ | |
2025-10-23 00:18:07 -05:00
| [prompt-embeds ](prompt_embeds.md ) | ✅ | ✅ | ✅ | ❌ | ✅ | ❌ | ❌ | ✅ | ❌ | ❔ | ❔ | ❌ | ❔ | ❔ | ✅ |
2025-05-23 11:09:53 +02:00
2025-12-04 21:44:15 +08:00
\* Chunked prefill and prefix caching are only applicable to last-token or all pooling with causal attention.
2025-07-30 16:32:39 +08:00
<sup>^</sup> LoRA is only applicable to the language backbone of multimodal models.
2025-07-29 21:29:51 +08:00
2025-09-08 14:45:14 +01:00
### Feature x Hardware
2025-05-23 11:09:53 +02:00
2025-11-20 19:05:59 -05:00
| Feature | Volta | Turing | Ampere | Ada | Hopper | CPU | AMD | Intel GPU |
|-----------------------------------------------------------|---------------------|-----------|-----------|--------|------------|--------------------|--------| ------------|
| [CP ](../configuration/optimization.md#chunked-prefill ) | [❌ ](https://github.com/vllm-project/vllm/issues/2729 ) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| [APC ](automatic_prefix_caching.md ) | [❌ ](https://github.com/vllm-project/vllm/issues/3687 ) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| [LoRA ](lora.md ) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
2025-12-23 13:22:15 +08:00
| [SD ](spec_decode.md ) | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ✅ | ✅ |
2025-11-20 19:05:59 -05:00
| CUDA graph | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ✅ | [❌ ](https://github.com/vllm-project/vllm/issues/26970 ) |
| [pooling ](../models/pooling_models.md ) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| <abbr title="Encoder-Decoder Models">enc-dec</abbr> | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ✅ |
2025-12-09 13:16:44 +08:00
| [mm ](multimodal_inputs.md ) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| [prompt-embeds ](prompt_embeds.md ) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❔ | ✅ |
2025-11-20 19:05:59 -05:00
| <abbr title="Logprobs">logP</abbr> | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| <abbr title="Prompt Logprobs">prmpt logP</abbr> | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| <abbr title="Async Output Processing">async output</abbr> | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ | ✅ |
| multi-step | ✅ | ✅ | ✅ | ✅ | ✅ | [❌ ](https://github.com/vllm-project/vllm/issues/8477 ) | ✅ | ✅ |
| best-of | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| beam-search | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
!!! note
For information on feature support on Google TPU, please refer to the [TPU-Inference Recommended Models and Features ](https://docs.vllm.ai/projects/tpu/en/latest/recommended_models_features/ ) documentation.