Files
nvfp4-megamoe-kernel/tests/unit
biondizzle ab84ad0f86 feat: implement canonical UMMA SMEM layout with SWIZZLE_128B
Proper implementation of the SMEM layout that tcgen05.mma expects:
- SWIZZLE_128B (layout_type=2) for both MN-major A and K-major B
- Swizzle<3,4,3> applied to element offsets before SMEM write
- MN_SW128 atom: (1024, 8) BF16, stride (1, 1024)
- K_SW128 atom: (8, 1024) BF16, stride (1, 8)
- umma_smem_write/read functions for both MN and K major
- Descriptor with correct leading_byte_offset and stride_byte_offset

This is the RIGHT WAY. No shortcuts.
2026-05-28 08:18:47 +00:00
..
2026-05-23 03:25:29 +00:00
2026-05-23 03:20:46 +00:00
2026-05-24 22:23:08 +00:00
2026-05-24 22:04:51 +00:00
2026-05-24 03:48:37 +00:00
2026-05-23 23:58:57 +00:00