biondizzle
  • Joined on 2025-12-10
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-24 00:41:29 +00:00
74e1c0420a D1.5: Implement correction epilog with paired atoms (get_tmem_load_op + get_smem_store_op)
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-24 00:37:37 +00:00
d96786ec44 D1.5: Add TODO for correction epilog - keeping working TMEM round-trip for now
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-24 00:35:06 +00:00
ae7a1f5e0a D1.5: Revert to pre-epilog backup - correction epilog refactor is complex, will do incrementally
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-24 00:32:37 +00:00
a59d57e4d5 D1.5: Fix TMA store - use local_tile with pv_mma_tiler
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-24 00:31:46 +00:00
a6bf31a22e D1.5: Fix TMA store rank mismatch - use 2D sC_epi view
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-24 00:30:41 +00:00
d316875145 D1.5: Implement correction epilog with get_tmem_load_op + get_smem_store_op paired atoms
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-24 00:24:27 +00:00
8514a72ba0 D1.5: Replace TMEM round-trip normalize with correction epilog (one-way: TMEM→reg→SMEM→GMEM)
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-24 00:16:24 +00:00
26de7254ad D1.3: Fix LSE tensor layout for weakly congruent store
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-24 00:15:43 +00:00
f259fafcae D1.3: Add unnormalized debug test to isolate SMEM-P vs O round-trip error
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-24 00:13:30 +00:00
b72062e47c D1.3: Add SMEM-P write/read diagnostic
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-24 00:10:20 +00:00
6d6b91dcb4 D1.3: Add SMEM-P vs TMEM-P comparison test
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-24 00:07:23 +00:00
f1b8fef3a2 D1.3: Fix while loop in cotiled diag - precompute num_tmem_alloc_cols
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-24 00:06:52 +00:00
921decb516 D1.3: Fix cotiled diagnostic - use proper MMA construction
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-24 00:05:49 +00:00
25747675cf D1.3: Add make_cotiled_copy diagnostic test
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-24 00:01:24 +00:00
24bc318480 shit left dangling
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-23 23:26:48 +00:00
d092a1743a D1.3: Re-enable coordinate-indexed SMEM-P write with identity tensor coords
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-23 23:26:08 +00:00
a17dca508d D1.3: Revert to zero-fill for sP - need to verify sP→PV pipeline first
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-23 23:24:55 +00:00
5be5d42e94 D1.3: Compute (m,k) directly from thread mapping instead of identity tensor
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-23 23:24:14 +00:00
23964d28c0 D1.3: Add debug prints for SMEM-P coordinate mapping
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-23 23:23:07 +00:00
1e5635b93f D1.3: Add SMEM-P coordinate diagnostic test