[Hardware] Replace torch.cuda.synchronize() api with torch.accelerator.synchronize (#36085)

Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
2026-03-05 18:36:39 +08:00
parent 0bfa229bf1
commit 66a2209645
59 changed files with 158 additions and 161 deletions
--- a/docs/design/model_runner_v2.md
+++ b/docs/design/model_runner_v2.md
@@ -50,7 +50,7 @@ V1 was not originally designed with async scheduling in mind, and support requir

 ## 3. Removing Async Barrier

-A key requirement for async execution is that CPU operations remain non-blocking. Both explicit sync (for example, `torch.cuda.synchronize`) and implicit sync (for example, unpinned `.to("cuda")`) must be avoided.
+A key requirement for async execution is that CPU operations remain non-blocking. Both explicit sync (for example, `torch.accelerator.synchronize`) and implicit sync (for example, unpinned `.to("cuda")`) must be avoided.

 However, async execution can introduce race conditions when CPU and GPU concurrently touch the same memory.

--- a/docs/usage/troubleshooting.md
+++ b/docs/usage/troubleshooting.md
@@ -95,7 +95,7 @@ If GPU/CPU communication cannot be established, you can use the following Python
    torch.cuda.set_device(local_rank)
    data = torch.FloatTensor([1,] * 128).to("cuda")
    dist.all_reduce(data, op=dist.ReduceOp.SUM)
-    torch.cuda.synchronize()
+    torch.accelerator.synchronize()
    value = data.mean().item()
    world_size = dist.get_world_size()
    assert value == world_size, f"Expected {world_size}, got {value}"