56 lines
2.0 KiB
Markdown
56 lines
2.0 KiB
Markdown
# vllm bench mm-processor
|
|
|
|
## Overview
|
|
|
|
`vllm bench mm-processor` profiles the multimodal input processor pipeline of
|
|
vision-language models. It measures per-stage latency from the HuggingFace
|
|
processor through to the encoder forward pass, helping you identify
|
|
preprocessing bottlenecks and understand how different image resolutions or
|
|
item counts affect end-to-end request time.
|
|
|
|
The benchmark supports two data sources: synthetic random multimodal inputs
|
|
(`random-mm`) and HuggingFace datasets (`hf`). Warmup requests are run before
|
|
measurement to ensure stable results.
|
|
|
|
## Quick Start
|
|
|
|
```bash
|
|
vllm bench mm-processor \
|
|
--model Qwen/Qwen2-VL-7B-Instruct \
|
|
--dataset-name random-mm \
|
|
--num-prompts 50 \
|
|
--random-input-len 300 \
|
|
--random-output-len 40 \
|
|
--random-mm-base-items-per-request 2 \
|
|
--random-mm-limit-mm-per-prompt '{"image": 3, "video": 0}' \
|
|
--random-mm-bucket-config '{(256, 256, 1): 0.7, (720, 1280, 1): 0.3}'
|
|
```
|
|
|
|
## Measured Stages
|
|
|
|
| Stage | Description |
|
|
|-------|-------------|
|
|
| `get_mm_hashes_secs` | Time spent hashing multimodal inputs |
|
|
| `get_cache_missing_items_secs` | Time spent looking up the processor cache |
|
|
| `apply_hf_processor_secs` | Time spent in the HuggingFace processor |
|
|
| `merge_mm_kwargs_secs` | Time spent merging multimodal kwargs |
|
|
| `apply_prompt_updates_secs` | Time spent updating prompt tokens |
|
|
| `preprocessor_total_secs` | Total preprocessing time |
|
|
| `encoder_forward_secs` | Time spent in the encoder model forward pass |
|
|
| `num_encoder_calls` | Number of encoder invocations per request |
|
|
|
|
The benchmark also reports end-to-end latency (TTFT + decode time) per
|
|
request. Use `--metric-percentiles` to select which percentiles to report
|
|
(default: p99) and `--output-json` to save results.
|
|
|
|
For more examples (HF datasets, warmup, JSON output), see
|
|
[Benchmarking CLI — Multimodal Processor Benchmark](../../benchmarking/cli.md#multimodal-processor-benchmark).
|
|
|
|
## JSON CLI Arguments
|
|
|
|
--8<-- "docs/cli/json_tip.inc.md"
|
|
|
|
## Arguments
|
|
|
|
--8<-- "docs/generated/argparse/bench_mm_processor.inc.md"
|