CuTeDSL caches kernels by (M, N, K) shape. Different layer shapes (L1 vs L2, different expert counts) trigger new compiles. We can't skip the warmup call — only suppress the print spam. Flag now gates the message, not the warmup.
CuTeDSL caches kernels by (M, N, K) shape. Different layer shapes (L1 vs L2, different expert counts) trigger new compiles. We can't skip the warmup call — only suppress the print spam. Flag now gates the message, not the warmup.