biondizzle
  • Joined on 2025-12-10
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-24 03:19:55 +00:00
791cd8b9c7 merge: keep our fmha.py (coordinate-indexed SMEM-P + epilogue_tma_store)
6313974fba D1.5: Fix SMEM-P - use coordinate-indexed store (same proven pattern)
Compare 2 commits »
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-24 03:19:19 +00:00
3fcb7a0a48 feat: SMEM-P with make_tiled_copy_tv + partition_S
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-24 03:18:40 +00:00
153db24be2 D1.5: Always output un-normalized O + LSE (epilogue_tma_store only, no TMEM round-trip normalize)
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-24 03:16:35 +00:00
d68ab348bb feat: SMEM-P using make_tiled_copy_A from PV MMA
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-24 02:45:09 +00:00
b4a985631b fix: fence_proxy not fence
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-24 02:44:15 +00:00
95f0898c64 merge: resolve conflict (keep our version)
228ec3c638 D1.5: Replace broken make_cotiled_copy SMEM-P with coordinate-indexed store
Compare 2 commits »
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-24 02:43:13 +00:00
c58ca550ae feat: SMEM-P with make_tiled_copy_tv + manual fill
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-24 02:42:30 +00:00
8faac948fc feat: SMEM-P using make_tiled_copy_tv + logical sP view
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-24 02:41:40 +00:00
eda7d40df2 Merge branch 'master' of ssh://sweetapi.com:2222/biondizzle/nvfp4-megamoe-kernel
952c25e227 D1.5: Use tCtO_fake layout for epilogue_tma_store (needs STAGE dim)
Compare 2 commits »
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-24 02:41:23 +00:00
0a980de7ad feat: SMEM-P using make_cotiled_copy (one-row-per-thread)
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-24 02:40:49 +00:00
85eb2bc4bb D1.5: Remove duplicate tTMrO definition (keep unconditional one)
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-24 02:40:17 +00:00
83077db55e merge
86ff386ea8 D1.5: Move tTMrO after O rescale atoms (fix tTMEM_LOADcO reference)
Compare 2 commits »
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-24 02:36:16 +00:00
cd223e1b98 fix: reorder tTMrO definition after tTMEM_LOADcO
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-24 02:34:54 +00:00
54e94d44ef fix: tTMrO scoping + restore SMEM-P coordinate write
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-24 02:32:53 +00:00
6ead708c7d D1.5: Move tTMrO def before softmax loop (CuTeDSL scoping)
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-24 02:31:14 +00:00
5a34865062 debug: zero-fill sP to check deadlock
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-24 02:30:33 +00:00
81652629e3 D1.5: Use proven Stage C approach - normalize via TMEM round-trip + epilogue_tma_store
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-24 02:15:09 +00:00
974cddbf7b test: add try/except for SMEM-P coord test
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-24 01:59:26 +00:00
5fd556db63 test: use FmhaKernel for SMEM-P coord test
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-24 01:58:39 +00:00
e50ba7212c test: SMEM-P coordinate verification test