Files
nvfp4-megamoe-kernel/tests
biondizzle ab84ad0f86 feat: implement canonical UMMA SMEM layout with SWIZZLE_128B
Proper implementation of the SMEM layout that tcgen05.mma expects:
- SWIZZLE_128B (layout_type=2) for both MN-major A and K-major B
- Swizzle<3,4,3> applied to element offsets before SMEM write
- MN_SW128 atom: (1024, 8) BF16, stride (1, 1024)
- K_SW128 atom: (8, 1024) BF16, stride (1, 8)
- umma_smem_write/read functions for both MN and K major
- Descriptor with correct leading_byte_offset and stride_byte_offset

This is the RIGHT WAY. No shortcuts.
2026-05-28 08:18:47 +00:00
..