Files
nvfp4-megamoe-kernel/tests/unit
biondizzle aed0e50946 Add O rescale with pre-built paired atoms (corr_tile_size=16)
Setup the correction_rescale atoms BEFORE the softmax loop so they can be
shared between per-tile O rescale and final normalize. Uses the working
2D register tensor pattern for final normalize. O rescale uses simple
1D rmem tensor per sub-tile (same as example10).
2026-05-23 00:28:44 +00:00
..