biondizzle
  • Joined on 2025-12-10
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-22 22:00:56 +00:00
12c166245d Use kt directly as TMA GMEM coordinate
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-22 21:59:55 +00:00
8c9d4eb1ef Try Int32(0) + kv_coord += 1 (matching reference pattern)
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-22 21:58:44 +00:00
7e832aa527 Use kvh.count for GMEM tile coordinate (pipeline-tracked SSA value)
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-22 21:56:57 +00:00
254f7be884 Fix diag: remove .rank
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-22 21:56:24 +00:00
4c1dbfd0f3 Add cute.printf shape diagnostics to example9
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-22 21:55:43 +00:00
a476324682 Fix TMA shape diag: use ct tensors for LayoutEnum
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-22 21:55:05 +00:00
fd6b1e82d8 TMA shape diag: pure Python, no JIT
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-22 21:54:24 +00:00
2f670e33d1 TMA shape diagnostic: exact setup from example9 + shape prints
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-22 21:53:25 +00:00
b64227e5b6 Fix group_modes range in TMA shape diag
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-22 21:52:57 +00:00
be27720cb2 Add TMA shape diagnostic
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-22 21:51:35 +00:00
845ad98b22 Fix TMA indexing: 4-mode tensors, kt at mode 2 (GMEM tile dim)
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-22 21:50:40 +00:00
61b0501a8b Fix test_fmha_v3_stage_c.py: 8-mode TMA indexing + O rescale (from example9)
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-22 21:43:12 +00:00
0996ffc1ba Add fmha_v3_stage_c_example10: 8-mode TMA + O rescale + paired-atom epilogue
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-22 21:36:25 +00:00
328f9b0080 Test n=384
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-22 21:36:02 +00:00
2da0a452d1 Quick test n=128,256
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-22 21:35:41 +00:00
0d3caced47 Add: O rescale (correction_rescale) in softmax loop + remove pk from TMA/MMA
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-22 21:32:19 +00:00
c47d229e6a Sweep test
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-22 21:31:21 +00:00
a751b3baf7 Sweep test: n=128,256,384,512,1024
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-22 21:29:05 +00:00
beaf60db5c DOCUMENT: TMA 8-mode indexing — the bug that cost us a full day. README + inline comments.
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-22 21:21:54 +00:00
27116110ab Fix identity diag: same 8D TMA indexing fix