docs/features/README.md

# Features

## Compatibility Matrix

The tables below show mutually exclusive features and the support on some hardware.

The symbols used have the following meanings:

- ✅ = Full compatibility
- 🟠 = Partial compatibility
- ❌ = No compatibility
- ❔ = Unknown or TBD

!!! note
    Check the ❌ or 🟠 with links to see tracking issue for unsupported feature/hardware combination.

### Feature x Feature

<style>
td:not(:first-child) {
  text-align: center !important;
}
td {
  padding: 0.5rem !important;
  white-space: nowrap;
}

th {
  padding: 0.5rem !important;
  min-width: 0 !important;
}

th:not(:first-child) {
  writing-mode: vertical-lr;
  transform: rotate(180deg)
}
</style>

| Feature | [CP](../configuration/optimization.md#chunked-prefill) | [APC](automatic_prefix_caching.md) | [LoRA](lora.md) | [SD](spec_decode.md) | CUDA graph | [pooling](../models/pooling_models.md) | <abbr title="Encoder-Decoder Models">enc-dec</abbr> | <abbr title="Logprobs">logP</abbr> | <abbr title="Prompt Logprobs">prmpt logP</abbr> | <abbr title="Async Output Processing">async output</abbr> | multi-step | <abbr title="Multimodal Inputs">mm</abbr> | best-of | beam-search | [prompt-embeds](prompt_embeds.md) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| [CP](../configuration/optimization.md#chunked-prefill) | ✅ | | | | | | | | | | | | | | |
| [APC](automatic_prefix_caching.md) | ✅ | ✅ | | | | | | | | | | | | | |
| [LoRA](lora.md) | ✅ | ✅ | ✅ | | | | | | | | | | | | |
| [SD](spec_decode.md) | ✅ | ✅ | ❌ | ✅ | | | | | | | | | | | |
| CUDA graph | ✅ | ✅ | ✅ | ✅ | ✅ | | | | | | | | | | |
| [pooling](../models/pooling_models.md) | 🟠\* | 🟠\* | ✅ | ❌ | ✅ | ✅ | | | | | | | | | |
| <abbr title="Encoder-Decoder Models">enc-dec</abbr> | ❌ | [❌](https://github.com/vllm-project/vllm/issues/7366) | ❌ | [❌](https://github.com/vllm-project/vllm/issues/7366) | ✅ | ✅ | ✅ | | | | | | | | |
| <abbr title="Logprobs">logP</abbr> | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ✅ | ✅ | | | | | | | |
| <abbr title="Prompt Logprobs">prmpt logP</abbr> | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ✅ | ✅ | ✅ | | | | | | |
| <abbr title="Async Output Processing">async output</abbr> | ✅ | ✅ | ✅ | ❌ | ✅ | ❌ | ❌ | ✅ | ✅ | ✅ | | | | | |
| multi-step | ❌ | ✅ | ❌ | ❌ | ✅ | ❌ | ❌ | ✅ | ✅ | ✅ | ✅ | | | | |
| [mm](multimodal_inputs.md) | ✅ | ✅ | [🟠](https://github.com/vllm-project/vllm/pull/4194)<sup>^</sup> | ❔ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❔ | ✅ | | | |
| best-of | ✅ | ✅ | ✅ | [❌](https://github.com/vllm-project/vllm/issues/6137) | ✅ | ❌ | ✅ | ✅ | ✅ | ❔ | [❌](https://github.com/vllm-project/vllm/issues/7968) | ✅ | ✅ | | |
| beam-search | ✅ | ✅ | ✅ | [❌](https://github.com/vllm-project/vllm/issues/6137) | ✅ | ❌ | ✅ | ✅ | ✅ | ❔ | [❌](https://github.com/vllm-project/vllm/issues/7968) | ❔ | ✅ | ✅ | |
| [prompt-embeds](prompt_embeds.md) | ✅ | ✅ | ✅ | ❌ | ✅ | ❌ | ❌ | ✅ | ❌ | ❔ | ❔ | ❌ | ❔ | ❔ | ✅ |

\* Chunked prefill and prefix caching are only applicable to last-token or all pooling with causal attention.  
<sup>^</sup> LoRA is only applicable to the language backbone of multimodal models.

### Feature x Hardware

| Feature                                                   | Volta               | Turing    | Ampere    | Ada    | Hopper     | CPU                | AMD    | Intel GPU |
|-----------------------------------------------------------|---------------------|-----------|-----------|--------|------------|--------------------|--------| ------------|
| [CP](../configuration/optimization.md#chunked-prefill)                                     | [❌](https://github.com/vllm-project/vllm/issues/2729) | ✅        | ✅        | ✅     | ✅        | ✅                  | ✅     | ✅        |
| [APC](automatic_prefix_caching.md)                        | [❌](https://github.com/vllm-project/vllm/issues/3687) | ✅        | ✅        | ✅     | ✅        | ✅                  | ✅     | ✅        |
| [LoRA](lora.md)                                           | ✅                  | ✅        | ✅        | ✅     | ✅        | ✅                  | ✅     | ✅        |
| [SD](spec_decode.md)                                      | ✅                  | ✅        | ✅        | ✅     | ✅        | ❌                  | ✅     | ✅        |
| CUDA graph                                                | ✅                  | ✅        | ✅        | ✅     | ✅        | ❌                  | ✅     | [❌](https://github.com/vllm-project/vllm/issues/26970)        |
| [pooling](../models/pooling_models.md)                    | ✅                  | ✅        | ✅        | ✅     | ✅        | ✅                  | ✅     | ✅        |
| <abbr title="Encoder-Decoder Models">enc-dec</abbr>       | ✅                  | ✅        | ✅        | ✅     | ✅        | ✅                  | ❌     | ✅        |
| [mm](multimodal_inputs.md)                                | ✅                  | ✅        | ✅        | ✅     | ✅        | ✅                  | ✅     | ✅        |
| [prompt-embeds](prompt_embeds.md)                         | ✅                  | ✅        | ✅        | ✅     | ✅        | ✅                  | ❔     | ✅        |
| <abbr title="Logprobs">logP</abbr>                        | ✅                  | ✅        | ✅        | ✅     | ✅        | ✅                  | ✅     | ✅        |
| <abbr title="Prompt Logprobs">prmpt logP</abbr>           | ✅                  | ✅        | ✅        | ✅     | ✅        | ✅                  | ✅     | ✅        |
| <abbr title="Async Output Processing">async output</abbr> | ✅                  | ✅        | ✅        | ✅     | ✅        | ❌                  | ❌     | ✅        |
| multi-step                                                | ✅                  | ✅        | ✅        | ✅     | ✅        | [❌](https://github.com/vllm-project/vllm/issues/8477) | ✅     | ✅        |
| best-of                                                   | ✅                  | ✅        | ✅        | ✅     | ✅        | ✅                  | ✅     | ✅        |
| beam-search                                               | ✅                  | ✅        | ✅        | ✅     | ✅        | ✅                  | ✅     | ✅        |

!!! note
    For information on feature support on Google TPU, please refer to the [TPU-Inference Recommended Models and Features](https://docs.vllm.ai/projects/tpu/en/latest/recommended_models_features/) documentation.
[Docs] Move feature compatibility tables to README (#24431) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> 2025-09-08 14:45:14 +01:00			`# Features`

			`## Compatibility Matrix`
Migrate docs from Sphinx to MkDocs (#18145) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> 2025-05-23 11:09:53 +02:00
			`The tables below show mutually exclusive features and the support on some hardware.`

			`The symbols used have the following meanings:`

			`- ✅ = Full compatibility`
			`- 🟠 = Partial compatibility`
			`- ❌ = No compatibility`
[Doc] Remove redundant spaces from compatibility_matrix.md (#18891) Signed-off-by: windsonsea <haifeng.yao@daocloud.io> 2025-05-29 17:51:20 +08:00			`- ❔ = Unknown or TBD`
Migrate docs from Sphinx to MkDocs (#18145) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> 2025-05-23 11:09:53 +02:00
			`!!! note`
			`Check the ❌ or 🟠 with links to see tracking issue for unsupported feature/hardware combination.`

[Docs] Move feature compatibility tables to README (#24431) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> 2025-09-08 14:45:14 +01:00			`### Feature x Feature`
Migrate docs from Sphinx to MkDocs (#18145) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> 2025-05-23 11:09:53 +02:00
			`<style>`
			`td:not(:first-child) {`
			`text-align: center !important;`
			`}`
			`td {`
			`padding: 0.5rem !important;`
			`white-space: nowrap;`
			`}`

			`th {`
			`padding: 0.5rem !important;`
			`min-width: 0 !important;`
			`}`

			`th:not(:first-child) {`
			`writing-mode: vertical-lr;`
			`transform: rotate(180deg)`
			`}`
			`</style>`

[Docs] Replace all explicit anchors with real links (#27087) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> 2025-10-17 10:22:06 +01:00			\| Feature \| [CP](../configuration/optimization.md#chunked-prefill) \| [APC](automatic_prefix_caching.md) \| [LoRA](lora.md) \| [SD](spec_decode.md) \| CUDA graph \| [pooling](../models/pooling_models.md) \| <abbr title="Encoder-Decoder Models">enc-dec</abbr> \| <abbr title="Logprobs">logP</abbr> \| <abbr title="Prompt Logprobs">prmpt logP</abbr> \| <abbr title="Async Output Processing">async output</abbr> \| multi-step \| <abbr title="Multimodal Inputs">mm</abbr> \| best-of \| beam-search \| [prompt-embeds](prompt_embeds.md) \|
[docs] Prompt Embedding feature support (#25288) Signed-off-by: Andrew Sansom <andrew@protopia.ai> 2025-09-19 19:46:23 -05:00			`\|---\|---\|---\|---\|---\|---\|---\|---\|---\|---\|---\|---\|---\|---\|---\|---\|`
[Docs] Replace all explicit anchors with real links (#27087) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> 2025-10-17 10:22:06 +01:00			`\| [CP](../configuration/optimization.md#chunked-prefill) \| ✅ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|`
[docs] Prompt Embedding feature support (#25288) Signed-off-by: Andrew Sansom <andrew@protopia.ai> 2025-09-19 19:46:23 -05:00			`\| [APC](automatic_prefix_caching.md) \| ✅ \| ✅ \| \| \| \| \| \| \| \| \| \| \| \| \| \|`
			`\| [LoRA](lora.md) \| ✅ \| ✅ \| ✅ \| \| \| \| \| \| \| \| \| \| \| \| \|`
			`\| [SD](spec_decode.md) \| ✅ \| ✅ \| ❌ \| ✅ \| \| \| \| \| \| \| \| \| \| \| \|`
			`\| CUDA graph \| ✅ \| ✅ \| ✅ \| ✅ \| ✅ \| \| \| \| \| \| \| \| \| \| \|`
			`\| [pooling](../models/pooling_models.md) \| 🟠\* \| 🟠\* \| ✅ \| ❌ \| ✅ \| ✅ \| \| \| \| \| \| \| \| \| \|`
[Docs] Reduce custom syntax used in docs (#27009) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> 2025-10-17 04:05:34 +01:00			`\| <abbr title="Encoder-Decoder Models">enc-dec</abbr> \| ❌ \| [❌](https://github.com/vllm-project/vllm/issues/7366) \| ❌ \| [❌](https://github.com/vllm-project/vllm/issues/7366) \| ✅ \| ✅ \| ✅ \| \| \| \| \| \| \| \| \|`
[docs] Prompt Embedding feature support (#25288) Signed-off-by: Andrew Sansom <andrew@protopia.ai> 2025-09-19 19:46:23 -05:00			`\| <abbr title="Logprobs">logP</abbr> \| ✅ \| ✅ \| ✅ \| ✅ \| ✅ \| ❌ \| ✅ \| ✅ \| \| \| \| \| \| \| \|`
			`\| <abbr title="Prompt Logprobs">prmpt logP</abbr> \| ✅ \| ✅ \| ✅ \| ✅ \| ✅ \| ❌ \| ✅ \| ✅ \| ✅ \| \| \| \| \| \| \|`
			`\| <abbr title="Async Output Processing">async output</abbr> \| ✅ \| ✅ \| ✅ \| ❌ \| ✅ \| ❌ \| ❌ \| ✅ \| ✅ \| ✅ \| \| \| \| \| \|`
			`\| multi-step \| ❌ \| ✅ \| ❌ \| ❌ \| ✅ \| ❌ \| ❌ \| ✅ \| ✅ \| ✅ \| ✅ \| \| \| \| \|`
[Docs] Reduce custom syntax used in docs (#27009) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> 2025-10-17 04:05:34 +01:00			`\| [mm](multimodal_inputs.md) \| ✅ \| ✅ \| [🟠](https://github.com/vllm-project/vllm/pull/4194)<sup>^</sup> \| ❔ \| ✅ \| ✅ \| ✅ \| ✅ \| ✅ \| ✅ \| ❔ \| ✅ \| \| \| \|`
			`\| best-of \| ✅ \| ✅ \| ✅ \| [❌](https://github.com/vllm-project/vllm/issues/6137) \| ✅ \| ❌ \| ✅ \| ✅ \| ✅ \| ❔ \| [❌](https://github.com/vllm-project/vllm/issues/7968) \| ✅ \| ✅ \| \| \|`
			`\| beam-search \| ✅ \| ✅ \| ✅ \| [❌](https://github.com/vllm-project/vllm/issues/6137) \| ✅ \| ❌ \| ✅ \| ✅ \| ✅ \| ❔ \| [❌](https://github.com/vllm-project/vllm/issues/7968) \| ❔ \| ✅ \| ✅ \| \|`
[CORE] Support Prefix Caching with Prompt Embeds (#27219) Signed-off-by: Andrew Sansom <andrew@protopia.ai> 2025-10-23 00:18:07 -05:00			`\| [prompt-embeds](prompt_embeds.md) \| ✅ \| ✅ \| ✅ \| ❌ \| ✅ \| ❌ \| ❌ \| ✅ \| ❌ \| ❔ \| ❔ \| ❌ \| ❔ \| ❔ \| ✅ \|`
Migrate docs from Sphinx to MkDocs (#18145) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> 2025-05-23 11:09:53 +02:00
[Model][6/N] Improve all pooling task \| Support chunked prefill with ALL pooling (#27145) Signed-off-by: wang.yuqi <noooop@126.com> Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> 2025-12-04 21:44:15 +08:00			`\* Chunked prefill and prefix caching are only applicable to last-token or all pooling with causal attention.`
[Doc] Update partial support (#21916) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> 2025-07-30 16:32:39 +08:00			`<sup>^</sup> LoRA is only applicable to the language backbone of multimodal models.`
[Doc] Update compatibility matrix for pooling and multimodal models (#21831) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> 2025-07-29 21:29:51 +08:00
[Docs] Move feature compatibility tables to README (#24431) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> 2025-09-08 14:45:14 +01:00			`### Feature x Hardware`
Migrate docs from Sphinx to MkDocs (#18145) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> 2025-05-23 11:09:53 +02:00
[Doc] cleanup TPU documentation and remove outdated examples (#29048) Signed-off-by: Rob Mulla <rob.mulla@gmail.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> 2025-11-20 19:05:59 -05:00			`\| Feature \| Volta \| Turing \| Ampere \| Ada \| Hopper \| CPU \| AMD \| Intel GPU \|`
			`\|-----------------------------------------------------------\|---------------------\|-----------\|-----------\|--------\|------------\|--------------------\|--------\| ------------\|`
			`\| [CP](../configuration/optimization.md#chunked-prefill) \| [❌](https://github.com/vllm-project/vllm/issues/2729) \| ✅ \| ✅ \| ✅ \| ✅ \| ✅ \| ✅ \| ✅ \|`
			`\| [APC](automatic_prefix_caching.md) \| [❌](https://github.com/vllm-project/vllm/issues/3687) \| ✅ \| ✅ \| ✅ \| ✅ \| ✅ \| ✅ \| ✅ \|`
			`\| [LoRA](lora.md) \| ✅ \| ✅ \| ✅ \| ✅ \| ✅ \| ✅ \| ✅ \| ✅ \|`
[XPU] decrease IGC_ForceOCLSIMDWidth for speculative decoding triton-xpu kernel compilation (#30538) Signed-off-by: Yan Ma <yan.ma@intel.com> 2025-12-23 13:22:15 +08:00			`\| [SD](spec_decode.md) \| ✅ \| ✅ \| ✅ \| ✅ \| ✅ \| ❌ \| ✅ \| ✅ \|`
[Doc] cleanup TPU documentation and remove outdated examples (#29048) Signed-off-by: Rob Mulla <rob.mulla@gmail.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> 2025-11-20 19:05:59 -05:00			`\| CUDA graph \| ✅ \| ✅ \| ✅ \| ✅ \| ✅ \| ❌ \| ✅ \| [❌](https://github.com/vllm-project/vllm/issues/26970) \|`
			`\| [pooling](../models/pooling_models.md) \| ✅ \| ✅ \| ✅ \| ✅ \| ✅ \| ✅ \| ✅ \| ✅ \|`
			`\| <abbr title="Encoder-Decoder Models">enc-dec</abbr> \| ✅ \| ✅ \| ✅ \| ✅ \| ✅ \| ✅ \| ❌ \| ✅ \|`
[Doc] update Intel GPU MM status in Feature x Hardware matrix (#30294) Signed-off-by: Lin, Fanli <fanli.lin@intel.com> 2025-12-09 13:16:44 +08:00			`\| [mm](multimodal_inputs.md) \| ✅ \| ✅ \| ✅ \| ✅ \| ✅ \| ✅ \| ✅ \| ✅ \|`
			`\| [prompt-embeds](prompt_embeds.md) \| ✅ \| ✅ \| ✅ \| ✅ \| ✅ \| ✅ \| ❔ \| ✅ \|`
[Doc] cleanup TPU documentation and remove outdated examples (#29048) Signed-off-by: Rob Mulla <rob.mulla@gmail.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> 2025-11-20 19:05:59 -05:00			`\| <abbr title="Logprobs">logP</abbr> \| ✅ \| ✅ \| ✅ \| ✅ \| ✅ \| ✅ \| ✅ \| ✅ \|`
			`\| <abbr title="Prompt Logprobs">prmpt logP</abbr> \| ✅ \| ✅ \| ✅ \| ✅ \| ✅ \| ✅ \| ✅ \| ✅ \|`
			`\| <abbr title="Async Output Processing">async output</abbr> \| ✅ \| ✅ \| ✅ \| ✅ \| ✅ \| ❌ \| ❌ \| ✅ \|`
			`\| multi-step \| ✅ \| ✅ \| ✅ \| ✅ \| ✅ \| [❌](https://github.com/vllm-project/vllm/issues/8477) \| ✅ \| ✅ \|`
			`\| best-of \| ✅ \| ✅ \| ✅ \| ✅ \| ✅ \| ✅ \| ✅ \| ✅ \|`
			`\| beam-search \| ✅ \| ✅ \| ✅ \| ✅ \| ✅ \| ✅ \| ✅ \| ✅ \|`

			`!!! note`
			`For information on feature support on Google TPU, please refer to the [TPU-Inference Recommended Models and Features](https://docs.vllm.ai/projects/tpu/en/latest/recommended_models_features/) documentation.`