Make PyTorch profiler gzip and CUDA time dump configurable (#29568)

Signed-off-by: Yifei Zhang <yifei.zhang1992@outlook.com>
This commit is contained in:
Yifei Zhang
2025-12-01 12:30:46 +08:00
committed by GitHub
parent f72a817bdf
commit 1ab8fc8197
5 changed files with 35 additions and 13 deletions

View File

@@ -11,6 +11,8 @@ We support tracing vLLM workers using the `torch.profiler` module. You can enabl
- `VLLM_TORCH_PROFILER_WITH_PROFILE_MEMORY=1` to record memory, off by default
- `VLLM_TORCH_PROFILER_WITH_STACK=1` to enable recording stack information, on by default
- `VLLM_TORCH_PROFILER_WITH_FLOPS=1` to enable recording FLOPs, off by default
- `VLLM_TORCH_PROFILER_USE_GZIP=0` to disable gzip-compressing profiling files, on by default
- `VLLM_TORCH_PROFILER_DUMP_CUDA_TIME_TOTAL=0` to disable dumping and printing the aggregated CUDA self time table, on by default
The OpenAI server also needs to be started with the `VLLM_TORCH_PROFILER_DIR` environment variable set.