[Benchmark] Add sampling parameters to benchmark_serving. (#16022)

Signed-off-by: Hyesoo Yang <hyeygit@gmail.com>
2025-04-05 21:30:35 -07:00
parent 620fc2d09e
commit ba10801961
3 changed files with 80 additions and 3 deletions
--- a/benchmarks/README.md
+++ b/benchmarks/README.md
@@ -204,6 +204,24 @@ python3 vllm/benchmarks/benchmark_serving.py \
    --seed 42
 ```

+### Running With Sampling Parameters
+
+When using OpenAI-compatible backends such as `vllm`, optional sampling
+parameters can be specified. Example client command:
+
+```bash
+python3 vllm/benchmarks/benchmark_serving.py \
+  --backend vllm \
+  --model NousResearch/Hermes-3-Llama-3.1-8B \
+  --endpoint /v1/completions \
+  --dataset-name sharegpt \
+  --dataset-path <your data path>/ShareGPT_V3_unfiltered_cleaned_split.json \
+  --top-k 10 \
+  --top-p 0.9 \
+  --temperature 0.5 \
+  --num-prompts 10
+```
+
 ---
 ## Example - Offline Throughput Benchmark