biondizzle
  • Joined on 2025-12-10
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-22 18:23:47 +00:00
f3503fc1ee FIX: TMEM offset bug in O rescale/normalize — use tOtO0.iterator not tOtO.iterator
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-22 18:07:09 +00:00
0b7ae7c969 Diag: test n=384 (3 tiles) to find crash boundary
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-22 18:06:30 +00:00
640ec3e96e Diag: test all sizes 128-1024
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-22 18:05:47 +00:00
02d993ecac DEBUG: disable O rescale to isolate NaN cause
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-22 18:01:14 +00:00
1c3970fe58 Add NaN/inf checking to stage C test
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-22 18:00:02 +00:00
d7a0fc2bc2 CRITICAL FIX: K GMEM slice (None,None,0,0) not (None,0,None,0)
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-22 17:59:03 +00:00
b6a2904e93 Diag: try K slice (None,None,0,0) keeping mode 1 (CUTLASS ref style)
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-22 17:58:00 +00:00
01621e1520 Diag: try runtime Int32(0+0) for kv_coord with cutlass.range
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-22 17:57:01 +00:00
beecc4df47 Diag: use Python range() unrolling like stage C test
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-22 17:56:16 +00:00
200430bd3f Fix diagnostic test: same Int32(kt) + n_kv_tiles fixes
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-22 17:51:27 +00:00
c23ebd5b57 Try cutlass.range with Int32(kt) — now n_kv_tiles is Python int
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-22 17:50:09 +00:00
4a41df51c4 FIX: n_kv_tiles as Python int (s_k//128) for range() unrolling
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-22 17:47:46 +00:00
70409636f7 Option 2: Python range() with Int32(kt) for TMA GMEM coord
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-22 17:47:11 +00:00
b55a38c4c3 Add example5: use cutlass.range induction variable as TMA GMEM coord
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-22 17:41:24 +00:00
aacad257ea README: add fire_b200_test docs, update multi-tile blocker with real findings
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-22 17:39:29 +00:00
93c28b9c29 Clean up debug prints, set kv_coord as Int32(0)
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-22 17:34:31 +00:00
1bba851911 DEBUG: try plain Python int kv_coord (like CUTLASS ref)
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-22 17:32:54 +00:00
15b2a28d29 DEBUG: hardcode kv_coord=1 to test if TMA uses it
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-22 17:30:07 +00:00
ff9ef6dcde DEBUG: try K slice (None,0,None,0) keeping mode 2 free
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-22 17:28:48 +00:00
cec6f59d66 DEBUG: print tBgK/tVgV shapes before/after slice