[Core][Bugfix] Use correct device to initialize GPU data during CUDA-graph-capture (#11233)

Signed-off-by: Yan Burman <yanburman@users.noreply.github.com>
Signed-off-by: Ido Asraff <idoa@atero.ai>
This commit is contained in:
Yan Burman
2025-01-04 08:50:16 +02:00
committed by GitHub
parent d91457d529
commit 300acb8347
5 changed files with 23 additions and 15 deletions

View File

@@ -836,7 +836,7 @@ class GPUModelRunner:
# Trigger CUDA graph capture for specific shapes.
# Capture the large shapes first so that the smaller shapes
# can reuse the memory pool allocated for the large shapes.
with graph_capture():
with graph_capture(device=self.device):
for num_tokens in reversed(self.cudagraph_batch_sizes):
for _ in range(self.vllm_config.compilation_config.
cudagraph_num_of_warmups):