[Kernel][Hardware][AMD] Bf16 mfma opt for ROCm skinny GEMMs (#17071)

Signed-off-by: Hashem Hashemi <hashem.hashemi@amd.com>
Signed-off-by: charlifu <charlifu@amd.com>
Co-authored-by: charlifu <charlifu@amd.com>
This commit is contained in:
Hashem Hashemi
2025-05-07 22:34:49 -07:00
committed by GitHub
parent 6930a41116
commit 5a499e70d5
4 changed files with 321 additions and 233 deletions

View File

@@ -8,7 +8,7 @@ from vllm.platforms import current_platform
DTYPES = [torch.bfloat16, torch.float16]
M = [16, 32, 64, 128, 256, 512, 1024, 4096, 8192]
K = [8, 16, 32, 64, 128, 256, 512, 1024, 2048, 4096, 8192] # k % 8 == 0
K = [8, 16, 32, 64, 128, 256, 512, 1024, 2048, 4096, 6144, 8192] # k % 8 == 0
N = [1, 2, 3, 4]
SEEDS = [0]