biondizzle
bfc1518046
FMHA Stage-C2: production 12-warp pipeline with correction warps
- Softmax warps (0-3): S→softmax→P, vec=[old_max,new_max]→TMEM
- Correction warps (4-7): O rescale in TMEM, final normalize by row_sum
- MMA warp (8): QK→S, PV→O with pipeline chaining
- TMA warp (9): Q/K/V load
- Epilogue warp (10): O TMEM→GMEM via epilogue_tma_store
- Empty warp (11): tmem dealloc mbar init
- Pipeline: mma_s→softmax→s_corr→correction→corr_epi→epilogue + mma_corr→correction
- Supports multi-tile KV with online O rescale
- Follows CUTLASS FMHA correction_rescale pattern exactly
2026-05-22 09:42:39 +00:00
..
2026-05-21 17:30:44 +00:00
2026-05-21 17:30:44 +00:00
2026-05-21 17:30:44 +00:00
2026-05-21 17:30:44 +00:00
2026-05-22 00:08:38 +00:00
2026-05-21 17:30:44 +00:00
2026-05-21 17:30:44 +00:00
2026-05-21 17:30:44 +00:00
2026-05-21 21:54:05 +00:00
2026-05-22 08:57:38 +00:00
2026-05-22 05:52:10 +00:00
2026-05-22 07:09:52 +00:00
2026-05-22 05:52:10 +00:00
2026-05-22 07:29:04 +00:00
2026-05-22 07:09:52 +00:00
2026-05-22 05:52:10 +00:00
2026-05-22 07:09:52 +00:00
2026-05-22 05:52:10 +00:00
2026-05-22 05:52:10 +00:00
2026-05-21 20:13:51 +00:00
2026-05-22 09:42:39 +00:00
2026-05-22 09:32:08 +00:00
2026-05-22 08:57:38 +00:00
2026-05-22 08:57:38 +00:00
2026-05-22 05:52:10 +00:00
2026-05-21 17:30:44 +00:00
2026-05-21 17:30:44 +00:00
2026-05-21 17:30:44 +00:00
2026-05-21 17:30:44 +00:00
2026-05-21 17:30:44 +00:00
2026-05-21 23:11:09 +00:00
2026-05-21 17:30:44 +00:00
2026-05-21 17:30:44 +00:00
2026-05-21 17:30:44 +00:00
2026-05-21 21:54:05 +00:00