vllm/vllm/v1/attention at ef2c4f778df5aa07a44e663330e2dfdc16927d2a - vllm

Files

Elvir Crnčević ef2c4f778d [Bugfix] Zero-init MLA attention output buffers to prevent NaN from CUDA graph padding (#37442 )

Signed-off-by: Elvir Crncevic <elvircrn@gmail.com>
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Co-authored-by: Matthew Bonanni <mbonanni@redhat.com>

2026-03-19 00:28:37 +00:00

backends

[Bugfix] Zero-init MLA attention output buffers to prevent NaN from CUDA graph padding (#37442 )

2026-03-19 00:28:37 +00:00

ops

[Kernel] Add FP8 KV cache support to Triton MLA decode attention (#34597 )

2026-03-12 08:32:34 -07:00

__init__.py

[V1] Implement vLLM V1 [1/N] (#9289 )

2024-10-22 01:24:07 -07:00

backend.py

[Misc] Add float16 to CacheDType (#37199 )

2026-03-16 13:24:48 -07:00

selector.py

Reapply [Attention] Refactor check_and_update_config (#35122 )

2026-03-09 07:17:14 -07:00