biondizzle
  • Joined on 2025-12-10
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-22 21:21:25 +00:00
bb92af5b0c FIX: Use full 8D indexing for tBgK/tVgV — mode 4 is the GMEM tile dim
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-22 21:20:47 +00:00
2a9f764f8b Diagnostic: check tBgK/tVgV layout strides for degenerate dims
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-22 21:14:44 +00:00
ae173d3963 Test identity diag multi-tile
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-22 21:13:39 +00:00
a35fb1b077 Minimal reference FMHA test: n=256 only
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-22 21:12:50 +00:00
4fc264e034 Test reference FMHA with proper API
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-22 21:12:03 +00:00
a800a83d5c Test: CUTLASS reference FMHA on B200 multi-tile
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-22 20:37:23 +00:00
2a14c2dd18 REVERT to working baseline (n=128 cos 0.999998). Multi-tile TMA is a CuTeDSL JIT limitation.
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-22 20:36:21 +00:00
1ab326f2d2 Test: use kvh.index (pipeline state) as TMA GMEM coordinate
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-22 20:35:46 +00:00
7b8b022e23 SMEM counter: separate allocate_tensor instead of struct field
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-22 20:35:19 +00:00
462778efcf Fix SMEM counter type: cutlass.Int32 for MemRange
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-22 20:34:54 +00:00
f5c827d0b9 SMEM-backed kv_coord counter — JIT can't constant-fold SMEM reads
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-22 20:34:23 +00:00
215282971c DEBUG: hardcoded Int32(1) to test if TMA can read tile 1
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-22 20:34:06 +00:00
79ebe20a39 DEBUG: use Int32(kt) directly to test if coordinate matters
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-22 20:33:42 +00:00
b3778896b9 Test: kv_coord = warp_idx() * 0 — force SSA from runtime value
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-22 20:33:05 +00:00
1de848c5ca DEBUG: add cute.printf for kv_coord runtime value
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-22 20:32:46 +00:00
587c16679c Test: Python range() instead of cutlass.range() for TMA loop
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-22 20:32:27 +00:00
91230fe5e6 Test example9: drop try_acquire/pk, single loop-carried kv_coord
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-22 20:28:18 +00:00
3c0451a3e5 REVERT to working example7 (n=128 cos 0.999998). Example8 TMA fix didn't work.
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-22 20:28:01 +00:00
880bd9ef81 Update stage_c test to example8: SSA kv_coord + per-tile O rescale
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-22 20:25:30 +00:00
c395b279d2 Clean up tests: archive superseded files, keep only essential unit tests