biondizzle
  • Joined on 2025-12-10
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-23 21:05:15 +00:00
a762820352 D1.3: Use MLIR-compatible expression for tOrP0 offset (same as Stage C)
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-23 21:03:55 +00:00
4fa4239f95 D1.3: Initialize tOrP0 before conditional for CuTeDSL scoping
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-23 21:03:02 +00:00
2bb3eb95ed D1.3: Fix tOrP0 for SMEM-P - skip make_tensor when offset is 0
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-23 21:02:05 +00:00
eabea91b64 D1.3: Fix tOrP0 offset - scale FP32 columns to BF16 elements
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-23 21:01:20 +00:00
47eade4afc D1.3: Fix CuTeDSL scoping - define tOrP0 unconditionally with p0 offset
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-23 21:00:31 +00:00
0e81fc18aa D1.3: Fix critical bug - add TMEM column offset for P0 in PV GEMM
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-23 20:58:07 +00:00
29f4480e26 D1.3: Revert to d1.3-pre-sm100-helpers baseline for testing
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-23 20:57:14 +00:00
adb4398505 D1.3: DIAGNOSTIC - test epilogue_tma_store raw PV without any round-trips
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-23 20:56:15 +00:00
0e41816636 D1.3: Remove NO-OP round-trip, keep normalize + epilogue_tma_store
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-23 20:54:29 +00:00
8bc8b21470 D1.3: Full correction_epilog with TMA store, normalize in reg before SMEM write
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-23 20:52:45 +00:00
d769e01a16 D1.3: Apply transform_partitioned_tensor_layout before epilogue helpers
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-23 20:50:31 +00:00
cc18fddc7e D1.3: Replace NO-op TMEM round-trip with correction_epilog using epilogue_tmem_copy_and_partition + epilogue_smem_copy_and_partition
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-23 20:20:15 +00:00
993ec32567 SMEM-P: test permutation 4 (swap m↔n2)
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-23 20:19:24 +00:00
c7a299d7d9 SMEM-P: add iterator offset debug print
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-23 20:14:34 +00:00
4943af749d SMEM-P: add tCrP debug print, reset permute to 0
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-23 20:14:22 +00:00
5a11f7c09a SMEM-P: test permutation 1 (swap m↔n0)
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-23 20:13:49 +00:00
d5081fe6f0 auto: pre-test commit
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-23 20:13:46 +00:00
fd54d657b2 SMEM-P: add debug_permute flag for coordinate permutation testing
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-23 20:12:29 +00:00
06409401ca SMEM-P: disable debug flags, revert to original mapping
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-23 20:11:51 +00:00
b8f0f0890a SMEM-P: fix scoping error, disable debug_p_one, enable debug_swap_mn