vllm/vllm/v1/attention/backends at 90b29e5302be0aeacddc5adb43b6c738e8156f97 - vllm

Files

Elvir Crnčević 89138b21cc [Bugfix] Zero-init MLA attention output buffers to prevent NaN from CUDA graph padding (#37442 )

Signed-off-by: Elvir Crncevic <elvircrn@gmail.com>
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Co-authored-by: Matthew Bonanni <mbonanni@redhat.com>
(cherry picked from commit ef2c4f778d)

2026-03-18 18:44:16 -07:00

mla

[Bugfix] Zero-init MLA attention output buffers to prevent NaN from CUDA graph padding (#37442 )

2026-03-18 18:44:16 -07:00

__init__.py

[V1] Implement vLLM V1 [1/N] (#9289 )

2024-10-22 01:24:07 -07:00

cpu_attn.py

[Misc][Attention] Clean up unused method in CPU_ATTN (#36673 )

2026-03-10 21:27:22 -07:00

fa_utils.py

[Attention] Use FA4 for MLA prefill (#34732 )

2026-03-12 12:10:17 -04:00

flash_attn_diffkv.py

[1/N][Attention] Restructure attention: move files (#31916 )

2026-01-09 13:10:24 -08:00

flash_attn.py

[Misc] Add float16 to CacheDType (#37199 )

2026-03-16 13:24:48 -07:00

flashinfer.py

[Misc] Add float16 to CacheDType (#37199 )

2026-03-16 13:24:48 -07:00

flex_attention.py

[Misc] Add float16 to CacheDType (#37199 )

2026-03-16 13:24:48 -07:00

gdn_attn.py

[Bugfix] Fix GDN attention crash with mixed decode/spec-decode batches (#34871 )

2026-03-16 10:30:23 +01:00

linear_attn.py

[V1][Hybrid] Mamba Prefix Caching with align mode (#30877 )

2026-01-23 09:56:48 -08:00

mamba1_attn.py

[Mamba1] - Kernel Level Chunk Alignment for Prefix Caching (#34798 )

2026-03-01 20:40:23 +08:00

mamba2_attn.py

[Mamba1] - Kernel Level Chunk Alignment for Prefix Caching (#34798 )

2026-03-01 20:40:23 +08:00

mamba_attn.py

[Bugfix][CI] fix typos (#34934 )

2026-03-05 17:05:46 +00:00

registry.py

Add XPU MLA Sparse backend for DeepSeek v3.2 (#33230 )

2026-03-11 19:19:15 +08:00

rocm_aiter_fa.py

[Misc] Add float16 to CacheDType (#37199 )

2026-03-16 13:24:48 -07:00

rocm_aiter_unified_attn.py

[ROCm][CI] Fix ROCm attention backend validation for head sizes, block sizes, and compute capability checks (#36292 )

2026-03-09 12:02:41 -05:00

rocm_attn.py

[Misc] Add float16 to CacheDType (#37199 )

2026-03-16 13:24:48 -07:00

short_conv_attn.py

[Misc] Add get_name to missing AttentionBackends (#32698 )

2026-01-23 10:35:44 +00:00

tree_attn.py

[Misc] Add float16 to CacheDType (#37199 )

2026-03-16 13:24:48 -07:00

triton_attn.py

[Misc] Add float16 to CacheDType (#37199 )

2026-03-16 13:24:48 -07:00

utils.py

[BugFix] Remove incorrect assert in split_decodes_and_prefills (#36553 )

2026-03-09 20:02:02 -07:00