[Attention] Add FlashInfer Sparse MLA backend (#33451)

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Co-authored-by: Lucas Wilkinson <lwilkins@redhat.com>
Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
This commit is contained in:
Matthew Bonanni
2026-02-12 12:21:54 -05:00
committed by GitHub
parent 334c715e0f
commit f2c47886fd
24 changed files with 1181 additions and 408 deletions

View File

@@ -36,11 +36,11 @@ batch_specs:
- "q1ks2k" # 1k query, 2k sequence
- "2q1ks4k" # 2 requests: 1k query, 4k sequence
# Available backends: flash, triton, flashinfer
# Available backends: FLASH_ATTN, TRITON_ATTN, FLASHINFER
backends:
- flash
- triton
- flashinfer
- FLASH_ATTN
- TRITON_ATTN
- FLASHINFER
device: "cuda:0"
repeats: 5