Canonical UMMA layout for SWIZZLE_NONE: - MN-major (128, 64): LBO=16, SBO=128 (from logical_divide Tile(1,8)) - K-major (128, 64): LBO=16, SBO=32 (from logical_divide Tile(8,2)) Using simple row-major SMEM layout (no swizzle, no permutation). Data is written directly to SMEM in row-major order. The descriptor strides describe the canonical layout.