Files
nvfp4-megamoe-kernel/dsv4/kernels
biondizzle df7bc40d37 D1.3: Direct coordinate-indexed SMEM-P write using tTMEM_LOADcS coords
Each softmax thread writes its P values to sP using the (m,k) coordinates
from tTMEM_LOADcS. The k coordinate is decomposed into (k0,k1,k2) to
match sP's ((128,16),1,(4,2)) layout. CuTeDSL tensor indexing handles
the swizzle automatically. No make_tiled_copy needed.
2026-05-23 23:19:21 +00:00
..