[Kernel] Flashinfer MLA (trtllm-gen) decode kernel integration (#21078)

Signed-off-by: hjjq <hanjieq@nvidia.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
This commit is contained in:
Hanjie Qiu
2025-09-10 18:31:10 -04:00
committed by GitHub
parent fba7856581
commit dcb28a332b
8 changed files with 255 additions and 1 deletions

View File

@@ -1504,6 +1504,7 @@ class EngineArgs:
"FLASH_ATTN_MLA",
"FLASHINFER",
"FLASHINFER_VLLM_V1",
"FLASHINFER_MLA",
"ROCM_AITER_MLA",
"TORCH_SDPA_VLLM_V1",
"FLEX_ATTENTION",