biondizzle
  • Joined on 2025-12-10
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-23 03:38:26 +00:00
059c2e6cd9 D1: P store as BF16 using PV A-fragment layout
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-23 03:36:42 +00:00
2efd6be8af D1: P store uses tOrP0.layout (PV A-fragment TMEM layout)
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-23 03:35:51 +00:00
7751eab711 D1 fix: P store uses PV A-fragment layout (p_tmem_s.outer)
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-23 03:34:00 +00:00
fe1826b0de D1: test raw unnormalized output via epilogue_tma_store
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-23 03:32:55 +00:00
091cb59be5 test: paired atoms epilog from old commit 6ee28d8
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-23 03:29:52 +00:00
f23d55fd3f D1: paired atoms epilogue (no TMEM round-trip)
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-23 03:26:17 +00:00
7df3c7c952 d1: sweep hd=64,128,256
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-23 03:25:31 +00:00
81378133cc fix: use mV.iterator
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-23 03:25:00 +00:00
a66a9efd4c fix: use mQ not q for LayoutEnum
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-23 03:24:17 +00:00
d2aaab5a32 d1: add diagnostic script
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-23 03:22:26 +00:00
a2d063a48b D1: N-tile support for HEAD_DIM>256
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-23 03:20:49 +00:00
7bc097163d d1: add hd=512 test
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-23 03:20:13 +00:00
32995c2ba3 d1: add quick regression test (hd=64 only)
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-23 03:19:56 +00:00
eed981bee5 D1: Parameterize HEAD_DIM in FmhaKernel (64→512)
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-23 03:10:43 +00:00
1a6c5e3822 docs: revised Stage D/E plan — indexer removes paged TMA, one kernel for CSA/HCA/SWA, sink merge
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-23 03:05:10 +00:00
a846193c4a cleanup: remove archive/ (240 stale files), stale example9/10, fix test table, add Stage D plan
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-23 03:01:04 +00:00
f3d0d67ae9 docs: update README with Stage C TMEM layout mismatch findings and status
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-23 02:54:55 +00:00
9c331de7ba fix: revert to composition layout for hand-constructed atoms (matching CUTLASS)
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-23 02:54:02 +00:00
3a2d3c66da fix: use logical_divide (not composition) for O rescale/normalize atoms to match get_tmem_load_op layout
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-23 02:50:55 +00:00
3aba5cc6da fix: add NO-OP TMEM round-trip to re-map O from MMA to epilog layout