Files
DeepGEMM/csrc
biondizzle c56f5dda7e fix: use UINT8 TMA for packed FP4 instead of 16U4_ALIGN8B
The 16U4_ALIGN8B TMA data type is not supported on this driver
(CUDA_ERROR_INVALID_VALUE). Use UINT8 TMA to load raw bytes and let
the UMMA descriptor interpret SMEM as packed FP4 for mxf4nvf4.
TMA dimensions stay in bytes (like UINT8).
2026-05-12 18:05:11 +00:00
..