- LBO = BLOCK_MN * 16 (bytes), SBO = 128 (bytes) for K-major NONE - Canonical SMEM layout: column-major interleaving of core matrices - idesc is SEPARATE 32-bit value (was using desc_a>>32 = WRONG) - idesc encodes dtype/atype/btype/MMA_M/MMA_N - This was the root cause of 'misaligned address' errors