Files
nvfp4-megamoe-kernel/src
biondizzle d90967d6e9 fix: SF remap — element-space K coords + zero-init dest buffer
Two fixes:
1. CuTe layout uses element-space K, not group-space. k_group=3 with
   SFVecSize=16 maps to k_elem=48 in the layout, not k=3.
   Added SFVecSize param to remap kernel, multiply k_sf * SFVecSize
   before passing to layout_sf().

2. Zero-init CUTLASS dest buffer before remap. The layout pads to
   tile boundaries (128x64), so dest is larger than M*K_sf. Unmapped
   padding slots reading garbage causes sporadic wrong results.
   Also fixed grid size to use source count (M*K_sf), not dest size.
2026-05-14 14:54:18 +00:00
..