fix: SF remap uses cute::cosize() instead of cute::size()

The comment explicitly warned about this: allocation uses cosize (physical
size including tile padding) but the iteration bound used size (logical size).
This meant padding positions in the CUTLASS SF layout were never written,
leaving them as zero instead of their actual SF values. With uniform data
(all-ones), all SF values are the same so the bug was invisible. With
random data, different SF values are needed at different positions and
the missing writes corrupt the result.
This commit is contained in:
2026-05-15 18:52:23 +00:00
parent 67dcfa83f5
commit c3841983a0

View File

@@ -125,7 +125,7 @@ __global__ void remap_sf_to_cutlass_kernel(
bool col_major_src = false // true if source is (K_sf, MN) row-major
) {
int dst_idx = blockIdx.x * blockDim.x + threadIdx.x;
int total = cute::size(layout_sf);
int total = cute::cosize(layout_sf);
if (dst_idx >= total) return;
auto coord = cute::idx2crd(dst_idx, layout_sf.shape(), layout_sf.stride());