Files
DeepGEMM/deep_gemm
biondizzle d6551617c0 fix: 4 kernel compilation fixes for packed FP4
1. sizeof_bits_v→sizeof_bits<T>::value (our CUTLASS lacks C++17 _v form)
2. reinterpret_cast<uint8_t*> for TMA copy and UMMA desc calls
   (smem_a returns float_e2m1_t* but templates expect uint8_t*)
3. kNumChunks extended to 4 (packed FP4 halved SMEM, need more chunks)
4. No code changes to PatternVisitor — all fixes at call sites
2026-05-11 23:17:51 +00:00
..