[Misc] support arbitrary MM datasets in spec dec bench (#33486)
Signed-off-by: kkt-cohere <komal@cohere.com> Signed-off-by: Komal Kumar Teru <162363718+kkt-cohere@users.noreply.github.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
This commit is contained in:
@@ -32,6 +32,7 @@ th {
|
||||
| HuggingFace-Blazedit | ✅ | ✅ | `vdaita/edit_5k_char`, `vdaita/edit_10k_char` |
|
||||
| Spec Bench | ✅ | ✅ | `wget https://raw.githubusercontent.com/hemingkx/Spec-Bench/refs/heads/main/data/spec_bench/question.jsonl` |
|
||||
| Custom | ✅ | ✅ | Local file: `data.jsonl` |
|
||||
| Custom MM | ✅ | ✅ | Local file: `mm_data.jsonl` |
|
||||
|
||||
Legend:
|
||||
|
||||
@@ -133,6 +134,33 @@ vllm bench serve --port 9001 --save-result --save-detailed \
|
||||
|
||||
You can skip applying chat template if your data already has it by using `--custom-skip-chat-template`.
|
||||
|
||||
#### Custom multimodal dataset
|
||||
|
||||
If the multimodal dataset you want to benchmark is not supported yet in vLLM, then you can benchmark on it using `CustomMMDataset`. Your data needs to be in `.jsonl` format and needs to have "prompt" and "image_files" field per entry, e.g., `mm_data.jsonl`:
|
||||
|
||||
```json
|
||||
{"prompt": "How many animals are present in the given image?", "image_files": ["/path/to/image/folder/horsepony.jpg"]}
|
||||
{"prompt": "What colour is the bird shown in the image?", "image_files": ["/path/to/image/folder/flycatcher.jpeg"]}
|
||||
```
|
||||
|
||||
```bash
|
||||
# need a model with vision capability here
|
||||
vllm serve Qwen/Qwen2-VL-7B-Instruct
|
||||
```
|
||||
|
||||
```bash
|
||||
# run benchmarking script
|
||||
vllm bench serve--save-result --save-detailed \
|
||||
--backend openai-chat \
|
||||
--model Qwen/Qwen2-VL-7B-Instruct \
|
||||
--endpoint /v1/chat/completions \
|
||||
--dataset-name custom_mm \
|
||||
--dataset-path <path-to-your-mm-data-jsonl> \
|
||||
--allowed-local-media-path /path/to/image/folder
|
||||
```
|
||||
|
||||
Note that we need to use the `openai-chat` backend and `/v1/chat/completions` endpoint for multimodal inputs.
|
||||
|
||||
#### VisionArena Benchmark for Vision Language Models
|
||||
|
||||
```bash
|
||||
|
||||
Reference in New Issue
Block a user