[Attention] Add FlashInfer Sparse MLA backend (#33451)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Co-authored-by: Lucas Wilkinson <lwilkins@redhat.com> Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
This commit is contained in:
@@ -36,11 +36,11 @@ batch_specs:
|
||||
- "q1ks2k" # 1k query, 2k sequence
|
||||
- "2q1ks4k" # 2 requests: 1k query, 4k sequence
|
||||
|
||||
# Available backends: flash, triton, flashinfer
|
||||
# Available backends: FLASH_ATTN, TRITON_ATTN, FLASHINFER
|
||||
backends:
|
||||
- flash
|
||||
- triton
|
||||
- flashinfer
|
||||
- FLASH_ATTN
|
||||
- TRITON_ATTN
|
||||
- FLASHINFER
|
||||
|
||||
device: "cuda:0"
|
||||
repeats: 5
|
||||
|
||||
Reference in New Issue
Block a user