Files
nvfp4-megamoe-kernel/dsv4
biondizzle c5adbbfde6 FMHA sink: don't double-scale sink bias
The sink bias from the checkpoint is already in the scaled domain
(added to QK*scale in the reference softmax). The kernel's
running_max is max(QK*scale), so the sink should be compared
directly without multiplying by scale again.
2026-05-31 23:12:20 +00:00
..