[Attention] relax the head dim 512 and paged kv for sm90+FA4 (#38835)

Signed-off-by: Siyuan Fu <siyuanf@nvidia.com>
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Co-authored-by: Lucas Wilkinson <lwilkins@redhat.com>
This commit is contained in:
Lain
2026-04-08 11:23:18 -07:00
committed by GitHub
parent b55d830ec7
commit e24e0a43a4
4 changed files with 29 additions and 10 deletions

View File

@@ -39,7 +39,7 @@ else()
FetchContent_Declare(
vllm-flash-attn
GIT_REPOSITORY https://github.com/vllm-project/flash-attention.git
GIT_TAG c0ec424fd8a546d0cbbf4bf050bbcfe837c55afb
GIT_TAG f5bc33cfc02c744d24a2e9d50e6db656de40611c
GIT_PROGRESS TRUE
# Don't share the vllm-flash-attn build between build types
BINARY_DIR ${CMAKE_BINARY_DIR}/vllm-flash-attn