[7/N][Attention][Docs] Add documentation for attention backends (#32477)

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
2026-01-28 17:20:22 -05:00
parent ca1969186d
commit 77c4f45c6c
4 changed files with 1481 additions and 1 deletions
--- a/docs/configuration/optimization.md
+++ b/docs/configuration/optimization.md
@@ -288,5 +288,11 @@ Based on the configuration, the content of the multi-modal caches on `P0` and `P
 | shm | Shared Memory Caching | K | N/A | V | `mm_processor_cache_gb * api_server_count` |
 | N/A | Disabled | N/A | N/A | N/A | `0` |

-K: Stores the hashes of multi-modal items  
+K: Stores the hashes of multi-modal items
 V: Stores the processed tensor data of multi-modal items
+
+## Attention Backend Selection
+
+vLLM supports multiple attention backends optimized for different hardware and use cases. The backend is automatically selected based on your GPU architecture, model type, and configuration, but you can also manually specify one for optimal performance.
+
+For detailed information on available backends, their feature support, and how to configure them, see the [Attention Backend Feature Support](../design/attention_backends.md) documentation.