Files
DeepGEMM/deep_gemm/mega
biondizzle acae75e109 fix: use transform_sf_into_required_layout for proper TMA-aligned SF
Instead of custom _pack_nvfp4_sf_for_utccp, use DeepGEMM's C++
transform_sf_into_required_layout with recipe (1, 1, 16) for NVFP4.
This handles TMA alignment and UTCCP transpose correctly.
2026-05-11 06:54:34 +00:00
..