Files
nvfp4-megamoe-kernel/dsv4
biondizzle ea5662ab2b D1.5: Implement correction epilog with get_tmem_load_op + get_smem_store_op paired atoms
- One-way: TMEM → registers (normalize) → SMEM → GMEM
- Eliminates TMEM round-trip error for O normalization
- O rescale (kt>0) still uses old atoms (fix later)
- Based on CUTLASS FMHA reference's correction_epilog pattern
2026-05-24 00:30:38 +00:00
..