1. sizeof_bits_v→sizeof_bits<T>::value (our CUTLASS lacks C++17 _v form) 2. reinterpret_cast<uint8_t*> for TMA copy and UMMA desc calls (smem_a returns float_e2m1_t* but templates expect uint8_t*) 3. kNumChunks extended to 4 (packed FP4 halved SMEM, need more chunks) 4. No code changes to PatternVisitor — all fixes at call sites