Files
nvfp4-megamoe-kernel/dsv4/kernels
biondizzle cba41d500c D1.3: Fix critical bug - add TMEM column offset for P0 in PV GEMM
The softmax warps store P at tmem_p0_offset=32. PV MMA must read from
the same offset. tOrP0 was missing the offset, causing PV to read from
TMEM column 0 (where S is) instead of column 32 (where P is).
This was the root cause of NaN/zeros in D1 tests.
2026-05-23 21:00:29 +00:00
..