Files
nvfp4-megamoe-kernel/tests
biondizzle 6f4bb0842e softmax: element-wise row_max computation instead of .reduce()
The .reduce() on the C-fragment gives global max across all rows,
not per-row max. Compute row_max element-wise from S values before
the exp2 pass. Also accumulate row_sum in the exp2 pass.
2026-05-22 09:27:36 +00:00
..
2026-05-22 08:57:38 +00:00