Files

Reagan Lee ac773bbe80 [Docs] Update docs to include mm processor + encoder benchmarks (#34083 )

Signed-off-by: Reagan <reaganjlee@gmail.com>

2026-03-05 01:38:25 -08:00

2.0 KiB

Raw Blame History

vllm bench mm-processor

Overview

vllm bench mm-processor profiles the multimodal input processor pipeline of vision-language models. It measures per-stage latency from the HuggingFace processor through to the encoder forward pass, helping you identify preprocessing bottlenecks and understand how different image resolutions or item counts affect end-to-end request time.

The benchmark supports two data sources: synthetic random multimodal inputs (random-mm) and HuggingFace datasets (hf). Warmup requests are run before measurement to ensure stable results.

Quick Start

vllm bench mm-processor \
  --model Qwen/Qwen2-VL-7B-Instruct \
  --dataset-name random-mm \
  --num-prompts 50 \
  --random-input-len 300 \
  --random-output-len 40 \
  --random-mm-base-items-per-request 2 \
  --random-mm-limit-mm-per-prompt '{"image": 3, "video": 0}' \
  --random-mm-bucket-config '{(256, 256, 1): 0.7, (720, 1280, 1): 0.3}'

Measured Stages

Stage	Description
`get_mm_hashes_secs`	Time spent hashing multimodal inputs
`get_cache_missing_items_secs`	Time spent looking up the processor cache
`apply_hf_processor_secs`	Time spent in the HuggingFace processor
`merge_mm_kwargs_secs`	Time spent merging multimodal kwargs
`apply_prompt_updates_secs`	Time spent updating prompt tokens
`preprocessor_total_secs`	Total preprocessing time
`encoder_forward_secs`	Time spent in the encoder model forward pass
`num_encoder_calls`	Number of encoder invocations per request

The benchmark also reports end-to-end latency (TTFT + decode time) per request. Use --metric-percentiles to select which percentiles to report (default: p99) and --output-json to save results.

For more examples (HF datasets, warmup, JSON output), see Benchmarking CLI — Multimodal Processor Benchmark.

JSON CLI Arguments

--8<-- "docs/cli/json_tip.inc.md"

Arguments

--8<-- "docs/generated/argparse/bench_mm_processor.inc.md"

2.0 KiB Raw Blame History