custom allreduce + torch.compile (#10121)

Signed-off-by: youkaichao <youkaichao@gmail.com> Co-authored-by: youkaichao <youkaichao@gmail.com>
2024-11-26 00:00:16 -06:00
parent 519e8e4182
commit 9a88f89799
6 changed files with 62 additions and 104 deletions
--- a/docs/source/getting_started/debugging.rst
+++ b/docs/source/getting_started/debugging.rst
@@ -86,7 +86,6 @@ If GPU/CPU communication cannot be established, you can use the following Python
    from vllm.distributed.device_communicators.pynccl import PyNcclCommunicator

    pynccl = PyNcclCommunicator(group=gloo_group, device=local_rank)
-    pynccl.disabled = False

    s = torch.cuda.Stream()
    with torch.cuda.stream(s):