Files
nvfp4-megamoe-kernel/dsv4/kernels
biondizzle ecbc75255c fix: correct UMMA descriptor format from CUTLASS source
The descriptor bitfield is completely different from what I assumed:
- [0,14) start_address (smem_ptr >> 4)
- [16,30) leading_byte_offset (row stride bytes >> 4)
- [32,46) stride_byte_offset
- [46,48) version = 1 (Blackwell)
- [61,64) layout_type (0=NONE, 2=128B, 4=64B, 6=32B)
- idescE = desc >> 32, passed as separate arg to MMA PTX

The 64-bit descriptor uses byte offsets (not log2 or element counts).
The upper 32 bits are reinterpreted by the MMA hardware as idescE.
2026-05-28 08:07:52 +00:00
..