[Model Runner V2] Optimize CUDA graph capture time (#29275)

Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
This commit is contained in:
Woosuk Kwon
2025-11-23 11:15:32 -08:00
committed by GitHub
parent b004c00418
commit 62d54ba46d
2 changed files with 5 additions and 1 deletions

View File

@@ -313,6 +313,7 @@ class GPUModelRunner(LoRAModelRunnerMixin, KVConnectorModelRunnerMixin):
return 0
start_time = time.perf_counter()
gc.collect()
torch.cuda.empty_cache()
start_free_gpu_memory = torch.cuda.mem_get_info()[0]