biondizzle
  • Joined on 2025-12-10
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-22 10:25:56 +00:00
5a08b79364 Revert "debug: test 12w identity softmax with n=256 to verify multi-tile pipeline"
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-22 10:24:54 +00:00
6cf8702e3c debug: test 12w identity softmax with n=256 to verify multi-tile pipeline
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-22 10:23:39 +00:00
a3c9af8fa3 debug: disable O rescaling to test multi-tile pipeline baseline
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-22 10:22:50 +00:00
c175ec4f09 fix: revert to scaled row_max, use exp2(old_max - new_max) for O rescaling
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-22 10:21:53 +00:00
35c8043064 fix: compute row_max from RAW S values, not scaled
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-22 10:20:38 +00:00
f9f5647eaa fix: missing newline after self.s_k = s_k
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-22 10:19:50 +00:00
e0c320929a fix: add s_k param to FmhaV3StageC, use self.s_k for V FMHA reconstruction
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-22 10:19:09 +00:00
fb4ffd8cf7 Stage C: add online O rescaling for multi-tile KV + test n=256
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-22 10:17:10 +00:00
94b0d97107 fix: add epilogue warp to tmem_bar, restore wait_for_alloc in epilogue
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-22 10:15:29 +00:00
65e52f5934 fix: add softmax_done_bar to synchronize MMA PV with softmax P production
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-22 10:11:58 +00:00
ea687980af fix: epilogue warp self-signals acc_pipe producer before consuming
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-22 10:05:52 +00:00
19b742f365 fix: remove duplicate tmem free from epilogue (MMA warp handles dealloc)
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-22 10:03:09 +00:00
0a3815049f fix: add acc_pipe pipeline for epilogue, matching 12w pattern
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-22 09:59:22 +00:00
59f4d8a469 fix: epilogue_warp_id must be tuple for epilogue_tma_store, check with [0]
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-22 09:56:19 +00:00
6ba12b7890 fix: epilogue warp reuse mma_corr_cons pipeline instead of creating new one from st
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-22 09:53:01 +00:00
540399eca3 fix: define cS and tScS in correction warps (not visible across if blocks)
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-22 09:49:38 +00:00
ee859099bd fix: correct @cute.kernel indentation
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-22 09:46:11 +00:00
fc7a790fbd fix: remove duplicate @cute.kernel decorator
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-22 09:42:40 +00:00
78aac51ab9 FMHA Stage-C2: production 12-warp pipeline with correction warps
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-22 09:39:17 +00:00
78d9024a67 README: update Stage C status to WORKING, add CuTeDSL constraints and target architecture