Make PyTorch profiler gzip and CUDA time dump configurable (#29568)

Signed-off-by: Yifei Zhang <yifei.zhang1992@outlook.com>
2025-12-01 12:30:46 +08:00
parent f72a817bdf
commit 1ab8fc8197
5 changed files with 35 additions and 13 deletions
--- a/docs/contributing/profiling.md
+++ b/docs/contributing/profiling.md
@@ -11,6 +11,8 @@ We support tracing vLLM workers using the `torch.profiler` module. You can enabl
 - `VLLM_TORCH_PROFILER_WITH_PROFILE_MEMORY=1` to record memory, off by default
 - `VLLM_TORCH_PROFILER_WITH_STACK=1` to enable recording stack information, on by default
 - `VLLM_TORCH_PROFILER_WITH_FLOPS=1` to enable recording FLOPs, off by default
+- `VLLM_TORCH_PROFILER_USE_GZIP=0` to disable gzip-compressing profiling files, on by default
+- `VLLM_TORCH_PROFILER_DUMP_CUDA_TIME_TOTAL=0` to disable dumping and printing the aggregated CUDA self time table, on by default

 The OpenAI server also needs to be started with the `VLLM_TORCH_PROFILER_DIR` environment variable set.