nvfp4-megamoe-kernel

Files

biondizzle cba41d500c D1.3: Fix critical bug - add TMEM column offset for P0 in PV GEMM

The softmax warps store P at tmem_p0_offset=32. PV MMA must read from
the same offset. tOrP0 was missing the offset, causing PV to read from
TMEM column 0 (where S is) instead of column 32 (where P is).
This was the root cause of NaN/zeros in D1 tests.

2026-05-23 21:00:29 +00:00

cache

Flush compressor: schema fix, prepare_forward, flush_write kernels, state rotation

2026-05-22 00:25:47 +00:00

kernels

D1.3: Fix critical bug - add TMEM column offset for P0 in PV GEMM

2026-05-23 21:00:29 +00:00

layers

Fix layer construction: match existing API signatures, add RMSNorm impl

2026-05-21 23:31:58 +00:00

loader

Restructure: cutedsl/ -> dsv4/ with proper layering