Files
nvfp4-megamoe-kernel/dsv4
biondizzle 10915c4e70 fix: remove double normalization in fmha_6warp_multihead epilogue
P was already normalized in softmax step. PV = P_norm @ V gives the
correct attention output. Dividing by row_sum again in the epilogue
produces O = O_correct / row_sum (128x too small for uniform data).
2026-05-30 08:26:20 +00:00
..