biondizzle
  • Joined on 2025-12-10
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-27 04:54:28 +00:00
3a7d87adba Fix test_smem_acc: use keyword args for lse/row_sums
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-27 04:53:48 +00:00
6a621bdf64 D1.5: SMEM accumulator FMHA kernel — one-way TMEM→REGS→SMEM, no round-trip
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-27 02:17:53 +00:00
81acf1593c Revert "D1.5: WIP SMEM accumulator — framework in place, accumulation logic TODO"
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-27 02:15:26 +00:00
72d88af400 D1.5: WIP SMEM accumulator — framework in place, accumulation logic TODO
a6da93ddfb Revert "D1.5: Try O rescale with tCtO_base layout (epilogue-proven TMEM addressing)"
Compare 2 commits »
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-27 02:10:41 +00:00
79e2eb3b42 D1.5: Try O rescale with tCtO_base layout (epilogue-proven TMEM addressing)
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-26 21:00:42 +00:00
f94978ffa7 D1.5: Prepare for SMEM accumulator implementation
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-26 20:55:19 +00:00
afb93eae22 D1.5: Revert broken TMEM round-trip O rescale, document as fundamentally broken
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-26 20:46:00 +00:00
42c5793add D1.5: Add isolated round-trip test comparing s_k=128 vs s_k=256 with NOOP rescale
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-26 20:43:34 +00:00
e35b30dae6 D1.5 debug: try corr_tile_size=32 for O rescale round-trip
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-26 20:31:29 +00:00
20ed6d5114 D1.5: Add TMEM load fence before PV with ACCUMULATE, revert debug rescale factor
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-26 20:29:39 +00:00
34d64137ec D1.5 debug: force rescale_factor=0.5 to test if round-trip code executes
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-26 20:28:57 +00:00
3be708d923 D1.5 debug: add NOOP rescale test (acc_scale=1.0) to isolate TMEM round-trip corruption
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-26 20:27:38 +00:00
c3648e4ebf D1.5 debug: add targeted s_k=256 rescale diagnostic test
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-26 20:26:10 +00:00
bf2c7c8bb8 D1.5: Implement in-kernel O rescale via CUTLASS correction_rescale pattern
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-26 19:53:13 +00:00
064ececc9a Update docs: D1.5 TMEM round-trip fundamentally broken, Python KV merge is production path
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-26 19:50:33 +00:00
2b4f4ce538 Remove broken D1.5 paired-atom test (TMEM round-trick is fundamentally broken)
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-26 19:50:17 +00:00
ffb3e736bb D1.5: Revert broken paired-atom O rescale — TMEM round-trip fundamentally broken
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-26 19:46:21 +00:00
40cbf0c223 Add D1.5 paired-atom O rescale test (s_k=256/384, hd=64/128)
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-26 19:34:28 +00:00
43f0b5d1e8 D1.5: Fix O rescale with paired atoms (incremental approach)
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-26 19:11:20 +00:00
4bb0e063cc D1.5: Replace broken TMEM round-trip with correction epilogue (paired atoms)