The 16U4_ALIGN8B TMA data type is not supported on this driver (CUDA_ERROR_INVALID_VALUE). Use UINT8 TMA to load raw bytes and let the UMMA descriptor interpret SMEM as packed FP4 for mxf4nvf4. TMA dimensions stay in bytes (like UINT8).
The 16U4_ALIGN8B TMA data type is not supported on this driver (CUDA_ERROR_INVALID_VALUE). Use UINT8 TMA to load raw bytes and let the UMMA descriptor interpret SMEM as packed FP4 for mxf4nvf4. TMA dimensions stay in bytes (like UINT8).