biondizzle
  • Joined on 2025-12-10
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-22 23:18:41 +00:00
1726f371c1 FIX: 8-None no-op pre-slice opens full TMA coordinate space (8 dims)
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-22 23:08:30 +00:00
2f716a5856 FIX: tma_partition tensors have 4 modes, not 8. Mode 2 is GMEM tile dim.
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-22 22:58:11 +00:00
02969c15fc Fix test_fmha_v3_stage_c.py: 8-mode TMA indexing (mode 4 = GMEM tile dim)
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-22 22:57:55 +00:00
80be9af3a0 Fix README: multi-tile was layout bug not JIT bug, add example10, update status
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-22 22:40:10 +00:00
078071e98a Add diag test with 8-mode TMA indexing from commit 2711611
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-22 22:38:09 +00:00
e69ead0c35 auto: pre-test commit
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-22 22:36:41 +00:00
beaf60db5c DOCUMENT: TMA 8-mode indexing — the bug that cost us a full day. README + inline comments.
27116110ab Fix identity diag: same 8D TMA indexing fix
bb92af5b0c FIX: Use full 8D indexing for tBgK/tVgV — mode 4 is the GMEM tile dim
2a9f764f8b Diagnostic: check tBgK/tVgV layout strides for degenerate dims
ae173d3963 Test identity diag multi-tile
Compare 8 commits »
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-22 22:31:34 +00:00
88b70e56fb DEBUG: print flat_divide shapes
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-22 22:30:51 +00:00
f64af7b0e0 Switch gK/gV to flat_divide (CUTLASS FMHA reference pattern) for proper TMA strides
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-22 22:28:34 +00:00
daa4017505 DEBUG: print tBgK/tVgV layout to check strides
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-22 22:27:59 +00:00
be103deb6d Try kt (cutlass.range induction) with correct (None,0,None,0) pre-slice
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-22 22:27:01 +00:00
d656598185 SSA-seed kv_coord: n_kv_tiles - n_kv_tiles forces JIT to track as runtime reg
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-22 22:26:18 +00:00
5de530fe18 Fix tBgK pre-slice: (None,0,None,0) preserves kv_tiles at mode 2 (matching tVgV)
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-22 22:25:37 +00:00
1b35e0f967 DEBUG: print TMA partition tensor shapes
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-22 22:25:02 +00:00
16f60e2dd1 Fix multi-tile TMA: loop-carried kv_coord (CUTLASS reference pattern)
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-22 22:18:53 +00:00
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-22 22:17:47 +00:00
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-22 22:16:24 +00:00
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-22 22:04:31 +00:00
8138e5c62a Dynamic blk_coord in pre-slice (matching reference pattern)
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-22 22:03:42 +00:00
55cdfad7c6 Pass seqlen_k as kernel arg, derive kv_coord dynamically (force SSA tracking)