Files
nvfp4-megamoe-kernel/dsv4/kernels
biondizzle 79e2eb3b42 D1.5: Try O rescale with tCtO_base layout (epilogue-proven TMEM addressing)
Previous attempts used tOtO0 (from pv_thr.make_fragment_C) and corrupted data.
This version uses tCtO_base (from pv_mma.make_fragment_C) which is the SAME
tensor the epilogue successfully reads O from. Both load and store atoms built
from same tCtO_i via composition — CUTLASS correction_rescale pattern.
2026-05-27 02:10:39 +00:00
..