[feat]: CUTLASS block scaled group gemm for SM100 (#19757)

Signed-off-by: Duncan Moss <djm.moss@gmail.com>
Co-authored-by: Duncan Moss <dmoss@nvidia.com>
This commit is contained in:
Duncan Moss
2025-07-04 11:58:04 -07:00
committed by GitHub
parent 2f35a022e6
commit 3d184b95b8
13 changed files with 726 additions and 30 deletions

View File

@@ -38,7 +38,6 @@
#include "cute/atom/mma_atom.hpp"
#include "cute/atom/copy_traits_sm90_tma.hpp"
#include "cute/algorithm/gemm.hpp"
#include "cute/tensor_predicate.hpp"
#include "cute/numeric/arithmetic_tuple.hpp"
#include "cutlass/pipeline/pipeline.hpp"
#include "cutlass/transform/collective/sm90_wgmma_transpose.hpp"