biondizzle
  • Joined on 2025-12-10
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-28 08:57:51 +00:00
fe0588d906 fix: simplify UMMA dump script
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-28 08:55:44 +00:00
948a3f8a7a add UMMA descriptor dump script
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-28 08:53:37 +00:00
e5ba0ca119 debug: clean QK verify with scalar sanity + MMA result
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-28 08:49:38 +00:00
a04d794979 debug: skip TMEM alloc — test SMEM loads only
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-28 08:48:10 +00:00
72c97f2546 debug: minimal UMMA descriptor (just start_addr + version)
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-28 08:47:00 +00:00
9a51bfa578 fix: align SMEM layout properly (128B aligned tmem + Q)
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-28 08:44:56 +00:00
2a765be715 fix: correct SMEM size for row-major (not swizzled)
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-28 08:43:41 +00:00
c64bd7b875 debug: read Q/K directly from SMEM
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-28 08:41:32 +00:00
58b610c96c fix: proper early return for SMEM load test
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-28 08:40:19 +00:00
82bc2c4a49 debug: verify SMEM loads + scalar QK sanity check
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-28 08:39:13 +00:00
53139d24bf debug: verify TMEM r/w works before MMA
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-28 08:38:10 +00:00
a9d71ff6ab debug: print TMEM values after MMA
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-28 08:37:03 +00:00
bfb1e177ce debug: try all-lane MMA + print tmem_base
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-28 08:35:32 +00:00
d3510980e4 feat: SWIZZLE_NONE UMMA descriptors with row-major SMEM
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-28 08:23:39 +00:00
8c67c31497 add CuTe descriptor printing script
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-28 08:20:57 +00:00
d29d6b575f add UMMA descriptor diagnostic script
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-28 08:18:48 +00:00
ab84ad0f86 feat: implement canonical UMMA SMEM layout with SWIZZLE_128B
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-28 08:07:55 +00:00
ecbc75255c fix: correct UMMA descriptor format from CUTLASS source
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-28 08:03:55 +00:00
fe7d561143 debug: print UMMA descriptor values for diagnosis
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-28 08:02:56 +00:00
c5f7a9a15c fix: align SMEM buffers to 16 bytes for UMMA descriptors