The C++ transform function expects int32 (for kInt type) with 4 UE4M3 bytes packed per int32. We pack first, then transform for TMA alignment and UTCCP transpose with recipe (1, 16).
The C++ transform function expects int32 (for kInt type) with 4 UE4M3 bytes packed per int32. We pack first, then transform for TMA alignment and UTCCP transpose with recipe (1, 16).