biondizzle
  • Joined on 2025-12-10
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-28 15:17:43 +00:00
bafd26707b FMHA HD=64 with BLOCK_MN_B=16, 4 N-tiles per K-tile
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-28 15:16:29 +00:00
6896d1aebb Update CURRENT_ISSUE: HD=16 done, HD=64 in progress
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-28 15:15:24 +00:00
6b9b06647a Clean up HD=64 debug prints, keep register-math PV check
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-28 15:13:49 +00:00
5c9d471162 Add register-math PV reference for HD=64 debug
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-28 15:12:22 +00:00
43e9efbc2b Fix string literal
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-28 15:11:17 +00:00
906be7ce50 Add filtered cosine (exclude near-zero)
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-28 15:09:57 +00:00
40c83c769a Fix: remove ×2 QK scale correction (MMA scale is 1.0, not 0.5)
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-28 15:07:56 +00:00
6ea7356fdd Debug: print P values for HD=64
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-28 15:06:39 +00:00
4b052f22a5 Fix: opt into >48KB shared memory for HD=64
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-28 15:03:28 +00:00
7becbfc07e Fix: printf after var declarations
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-28 15:02:01 +00:00
2d44f8e356 Debug: check if HD=64 kernel starts
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-28 14:58:13 +00:00
46e4d07c71 Test PV SS MMA with B=(64,16) BLOCK_MN=64
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-28 14:56:08 +00:00
465e089a2b Add launch error check for HD=64
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-28 14:54:13 +00:00
2fd64c464d FMHA HD=64 with BLOCK_MN_B=64 for V, proper output dimensions
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-28 14:52:32 +00:00
15ecc1f616 Full FMHA HD=64 with PV SS MMA (SMEM-P)
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-28 14:50:44 +00:00
5b2e690936 Milestone: Full FMHA HD=16 with PV SS MMA (SMEM-P) — cosine 0.9997
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-28 14:49:21 +00:00
78026839b7 Fix V canonical layout: swap g_mn/g_k indices (d=MN, lr=K)
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-28 14:47:12 +00:00
9a3b43c42b Fix reference to also use uniform P
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-28 14:46:23 +00:00
75bdcbf728 Debug: override P with uniform 1/128
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-28 14:45:16 +00:00
af93c283c7 Enable all 8 PV K-tiles