fix: SF remap uses cute::cosize() instead of cute::size()
The comment explicitly warned about this: allocation uses cosize (physical size including tile padding) but the iteration bound used size (logical size). This meant padding positions in the CUTLASS SF layout were never written, leaving them as zero instead of their actual SF values. With uniform data (all-ones), all SF values are the same so the bug was invisible. With random data, different SF values are needed at different positions and the missing writes corrupt the result.
This commit is contained in:
@@ -125,7 +125,7 @@ __global__ void remap_sf_to_cutlass_kernel(
|
||||
bool col_major_src = false // true if source is (K_sf, MN) row-major
|
||||
) {
|
||||
int dst_idx = blockIdx.x * blockDim.x + threadIdx.x;
|
||||
int total = cute::size(layout_sf);
|
||||
int total = cute::cosize(layout_sf);
|
||||
if (dst_idx >= total) return;
|
||||
|
||||
auto coord = cute::idx2crd(dst_idx, layout_sf.shape(), layout_sf.stride());
|
||||
|
||||
Reference in New Issue
Block a user