m = f0 + f1*32 + f2*128 (CuTe 'first sub varies fastest') k_sf = f4 + f5*4 f3 is the Step<2> stride (degenerate, always=total), NOT a coordinate. Previous formula (f3*2+f2)*128 was catastrophically wrong — mapped everything to m=0 or m=huge.
m = f0 + f1*32 + f2*128 (CuTe 'first sub varies fastest') k_sf = f4 + f5*4 f3 is the Step<2> stride (degenerate, always=total), NOT a coordinate. Previous formula (f3*2+f2)*128 was catastrophically wrong — mapped everything to m=0 or m=huge.