CuTeDSL @cute.kernel cannot handle dynamic-shape tensors as parameters. Pass swa_len as Int32 scalar instead of a 1D tensor. This works for batch_size=1 (current config). Updated D3 and D4 tests to pass swa_len as int.
CuTeDSL @cute.kernel cannot handle dynamic-shape tensors as parameters. Pass swa_len as Int32 scalar instead of a 1D tensor. This works for batch_size=1 (current config). Updated D3 and D4 tests to pass swa_len as int.