cutlass.range traces once - kv_coord/kt are trace-time values, not runtime loop-carried state. Python range() fully unrolls at trace time, emitting distinct Int32(k) constants per iteration. Int32(1) hardcoded already proved TMA CAN load from tile 1.