[Doc] Move CONTRIBUTING to docs site (#9924)

Signed-off-by: Russell Bryant <rbryant@redhat.com>
2024-11-08 00:15:12 -05:00
parent 7371749d54
commit 3a7f15a398
10 changed files with 103 additions and 71 deletions
--- a/docs/source/contributing/profiling/profiling_index.rst
+++ b/docs/source/contributing/profiling/profiling_index.rst
@@ -0,0 +1,48 @@
+==============
+Profiling vLLM
+==============
+
+We support tracing vLLM workers using the ``torch.profiler`` module. You can enable tracing by setting the ``VLLM_TORCH_PROFILER_DIR`` environment variable to the directory where you want to save the traces: ``VLLM_TORCH_PROFILER_DIR=/mnt/traces/``
+
+The OpenAI server also needs to be started with the ``VLLM_TORCH_PROFILER_DIR`` environment variable set.
+
+When using ``benchmarks/benchmark_serving.py``, you can enable profiling by passing the ``--profile`` flag.
+
+.. warning::
+
+   Only enable profiling in a development environment. 
+
+
+Traces can be visualized using https://ui.perfetto.dev/.
+
+.. tip::
+
+   Only send a few requests through vLLM when profiling, as the traces can get quite large. Also, no need to untar the traces, they can be viewed directly.
+
+.. tip::
+
+   To stop the profiler - it flushes out all the profile trace files to the directory. This takes time, for example for about 100 requests worth of data for a llama 70b, it takes about 10 minutes to flush out on a H100.
+   Set the env variable VLLM_RPC_TIMEOUT to a big number before you start the server. Say something like 30 minutes.
+   ``export VLLM_RPC_TIMEOUT=1800000``
+  
+Example commands and usage:
+===========================
+
+Offline Inference:
+------------------
+
+Refer to `examples/offline_inference_with_profiler.py <https://github.com/vllm-project/vllm/blob/main/examples/offline_inference_with_profiler.py>`_ for an example.
+
+
+OpenAI Server:
+--------------
+
+.. code-block:: bash
+
+    VLLM_TORCH_PROFILER_DIR=./vllm_profile python -m vllm.entrypoints.openai.api_server --model meta-llama/Meta-Llama-3-70B 
+
+benchmark_serving.py:
+
+.. code-block:: bash
+
+    python benchmarks/benchmark_serving.py --backend vllm --model meta-llama/Meta-Llama-3-70B --dataset-name sharegpt --dataset-path sharegpt.json --profile --num-prompts 2