Files
nvfp4-megamoe-kernel/tests/unit
biondizzle f4b7bed463 Fix final normalize: use working 2D register tensor pattern from working_softmax_maybe.py
The make_rmem_tensor(tTMEM_LOADcO.shape) creates a 1D tensor that doesn't
match the paired atom layout. The working pattern uses a 2D register tensor
with sub-tile composition (tTMrO_i_ = tTMrO[None, i] + composition).
2026-05-23 00:25:16 +00:00
..