[Misc][Benchmarking] Add variable request-rate ("ramp-up") to the benchmarking client. (#19423)

Signed-off-by: dtransposed <damian@damian-ml-machine.europe-west3-b.c.jetbrains-grazie.internal> Co-authored-by: dtransposed <damian@damian-ml-machine.europe-west3-b.c.jetbrains-grazie.internal> Co-authored-by: Roger Wang <hey@rogerw.me>
2025-06-24 20:41:49 +02:00
parent a045b7e89a
commit c635c5f744
3 changed files with 330 additions and 34 deletions
--- a/benchmarks/README.md
+++ b/benchmarks/README.md
@@ -269,6 +269,21 @@ python3 vllm/benchmarks/benchmark_serving.py \
  --num-prompts 10
 ```

+### Running With Ramp-Up Request Rate
+
+The benchmark tool also supports ramping up the request rate over the
+duration of the benchmark run. This can be useful for stress testing the
+server or finding the maximum throughput that it can handle, given some latency budget.
+
+Two ramp-up strategies are supported:
+- `linear`: Increases the request rate linearly from a start value to an end value.
+- `exponential`: Increases the request rate exponentially.
+
+The following arguments can be used to control the ramp-up:
+- `--ramp-up-strategy`: The ramp-up strategy to use (`linear` or `exponential`).
+- `--ramp-up-start-rps`: The request rate at the beginning of the benchmark.
+- `--ramp-up-end-rps`: The request rate at the end of the benchmark.
+
 ---
 ## Example - Offline Throughput Benchmark