[Hardware] Replace torch.cuda.synchronize() api with torch.accelerator.synchronize (#36085)
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
This commit is contained in:
@@ -50,7 +50,7 @@ V1 was not originally designed with async scheduling in mind, and support requir
|
||||
|
||||
## 3. Removing Async Barrier
|
||||
|
||||
A key requirement for async execution is that CPU operations remain non-blocking. Both explicit sync (for example, `torch.cuda.synchronize`) and implicit sync (for example, unpinned `.to("cuda")`) must be avoided.
|
||||
A key requirement for async execution is that CPU operations remain non-blocking. Both explicit sync (for example, `torch.accelerator.synchronize`) and implicit sync (for example, unpinned `.to("cuda")`) must be avoided.
|
||||
|
||||
However, async execution can introduce race conditions when CPU and GPU concurrently touch the same memory.
|
||||
|
||||
|
||||
@@ -95,7 +95,7 @@ If GPU/CPU communication cannot be established, you can use the following Python
|
||||
torch.cuda.set_device(local_rank)
|
||||
data = torch.FloatTensor([1,] * 128).to("cuda")
|
||||
dist.all_reduce(data, op=dist.ReduceOp.SUM)
|
||||
torch.cuda.synchronize()
|
||||
torch.accelerator.synchronize()
|
||||
value = data.mean().item()
|
||||
world_size = dist.get_world_size()
|
||||
assert value == world_size, f"Expected {world_size}, got {value}"
|
||||
|
||||
Reference in New Issue
Block a user