Files
nvfp4-megamoe-kernel/tests/unit
biondizzle b80a1ab083 Stage C: add online O rescaling for multi-tile KV + test n=256
- Move O TMEM load/store setup before softmax loop
- After P store: rescale O in TMEM by exp2((old_max - new_max) * scale)
- Only rescale for kt > 0 (first tile has no prior O to rescale)
- Use same TMEM load/modify/store pattern as final normalization
- Test both n=128 (1 tile) and n=256 (2 tiles)
2026-05-22 10:19:08 +00:00
..
2026-05-22 08:57:38 +00:00
2026-05-22 08:57:38 +00:00
2026-05-22 08:57:38 +00:00