Add docs for PrefixRepetitionDataset + enable usage with vllm bench throughput (#23012)

Signed-off-by: Seiji Eicher <seiji@anyscale.com>
Co-authored-by: Roger Wang <hey@rogerw.me>
This commit is contained in:
Seiji Eicher
2025-08-16 03:21:20 -07:00
committed by GitHub
parent 2dbccce8a6
commit de9cb61763
2 changed files with 73 additions and 6 deletions

View File

@@ -40,7 +40,7 @@ become available.
<td><code>wget https://github.com/HPMLL/BurstGPT/releases/download/v1.1/BurstGPT_without_fails_2.csv</code></td>
</tr>
<tr>
<td><strong>Sonnet</strong></td>
<td><strong>Sonnet (deprecated)</strong></td>
<td style="text-align: center;">✅</td>
<td style="text-align: center;">✅</td>
<td>Local file: <code>benchmarks/sonnet.txt</code></td>
@@ -51,6 +51,12 @@ become available.
<td style="text-align: center;">✅</td>
<td><code>synthetic</code></td>
</tr>
<tr>
<td><strong>Prefix Repetition</strong></td>
<td style="text-align: center;">✅</td>
<td style="text-align: center;">✅</td>
<td><code>synthetic</code></td>
</tr>
<tr>
<td><strong>HuggingFace-VisionArena</strong></td>
<td style="text-align: center;">✅</td>
@@ -592,6 +598,20 @@ python3 benchmarks/benchmark_prefix_caching.py \
--input-length-range 128:256
```
### Prefix Repetition Dataset
```bash
vllm bench serve \
--backend openai \
--model meta-llama/Llama-2-7b-chat-hf \
--dataset-name prefix_repetition \
--num-prompts 100 \
--prefix-repetition-prefix-len 512 \
--prefix-repetition-suffix-len 128 \
--prefix-repetition-num-prefixes 5 \
--prefix-repetition-output-len 128
```
</details>
## ⚡ Example - Request Prioritization Benchmark