Files
vllm/docs/cli/bench/mm_processor.md
2026-03-05 01:38:25 -08:00

2.0 KiB

vllm bench mm-processor

Overview

vllm bench mm-processor profiles the multimodal input processor pipeline of vision-language models. It measures per-stage latency from the HuggingFace processor through to the encoder forward pass, helping you identify preprocessing bottlenecks and understand how different image resolutions or item counts affect end-to-end request time.

The benchmark supports two data sources: synthetic random multimodal inputs (random-mm) and HuggingFace datasets (hf). Warmup requests are run before measurement to ensure stable results.

Quick Start

vllm bench mm-processor \
  --model Qwen/Qwen2-VL-7B-Instruct \
  --dataset-name random-mm \
  --num-prompts 50 \
  --random-input-len 300 \
  --random-output-len 40 \
  --random-mm-base-items-per-request 2 \
  --random-mm-limit-mm-per-prompt '{"image": 3, "video": 0}' \
  --random-mm-bucket-config '{(256, 256, 1): 0.7, (720, 1280, 1): 0.3}'

Measured Stages

Stage Description
get_mm_hashes_secs Time spent hashing multimodal inputs
get_cache_missing_items_secs Time spent looking up the processor cache
apply_hf_processor_secs Time spent in the HuggingFace processor
merge_mm_kwargs_secs Time spent merging multimodal kwargs
apply_prompt_updates_secs Time spent updating prompt tokens
preprocessor_total_secs Total preprocessing time
encoder_forward_secs Time spent in the encoder model forward pass
num_encoder_calls Number of encoder invocations per request

The benchmark also reports end-to-end latency (TTFT + decode time) per request. Use --metric-percentiles to select which percentiles to report (default: p99) and --output-json to save results.

For more examples (HF datasets, warmup, JSON output), see Benchmarking CLI — Multimodal Processor Benchmark.

JSON CLI Arguments

--8<-- "docs/cli/json_tip.inc.md"

Arguments

--8<-- "docs/generated/argparse/bench_mm_processor.inc.md"