kPackedFP4 = torch::kInt8, so the kInt8 case was a duplicate. The real fix was in mega_nvfp4.hpp: changing kUInt8→kInt8 so tensors match the existing kPackedFP4 path in the TMA switch.