Files
nvfp4-megamoe-kernel/tests
biondizzle 7b8ee862bd add explicit acc_pipe.consumer_wait before final normalize
Race condition: softmax reads O to normalize while MMA may still be
writing PV[N-1]. Single-tile wins by luck; multi-tile drifts.
Move acc_cons_st construction before the wait so epilogue reuses it.
2026-05-22 15:49:48 +00:00
..
2026-05-22 08:57:38 +00:00