Files

Dimitrios Bariamis f0bca83ee4 Add support for Mistral Large 3 inference with Flashinfer MoE (#33174 )

Signed-off-by: Dimitrios Bariamis <12195802+dbari@users.noreply.github.com>
Co-authored-by: Dimitrios Bariamis <12195802+dbari@users.noreply.github.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>

2026-01-30 22:48:27 -08:00

attention_benchmarks

Add attention benchmarking tools (#26835 )

2026-01-28 00:09:20 +00:00

auto_tune

[Benchmarks] auto_tune.sh: Use hostname variable for server requests (#30529 )

2025-12-15 22:00:29 +00:00

cutlass_benchmarks

[Misc] Improve error messages for unsupported types and parameters (#30593 )

2026-01-07 09:00:16 +00:00

disagg_benchmarks

[BugFix][PD]: make example proxy usable with P2pNcclConnector (#26628 )

2025-11-20 17:38:31 +00:00

fused_kernels

[Performance] Fused blockwise quant RMS norm (#27883 )

2025-12-07 16:38:04 +00:00

kernels

Add support for Mistral Large 3 inference with Flashinfer MoE (#33174 )

2026-01-30 22:48:27 -08:00

multi_turn

[Benchmark] multi_turn: Report warmup-inclusive runtime (#28937 )

2025-11-18 16:38:22 +00:00

overheads

[Chore]:Extract math and argparse utilities to separate modules (#27188 )

2025-10-26 04:03:32 -07:00

structured_schemas

benchmarks: simplify test jsonschema (#14567 )

2025-03-11 13:39:30 +00:00

backend_request_func.py

[Chore] Adjust tokenizer import to avoid circular imports (#30601 )

2025-12-13 04:42:39 -08:00

benchmark_batch_invariance.py

[Chore] Update more locations to use attention_config.backend (#31153 )

2025-12-22 19:19:50 -08:00

benchmark_block_pool.py

[Chore]:Extract math and argparse utilities to separate modules (#27188 )

2025-10-26 04:03:32 -07:00

benchmark_hash.py

[Core] Add xxHash as a high-performance hash option for accelerating prefix caching (#29163 )

2025-12-03 16:06:57 +00:00

benchmark_latency.py

[CI/Build][Doc] Fully deprecate old bench scripts for serving / throughput / latency (#24411 )

2025-09-09 10:02:35 +00:00

benchmark_long_document_qa_throughput.py

[Chore]:Extract math and argparse utilities to separate modules (#27188 )

2025-10-26 04:03:32 -07:00

benchmark_ngram_proposer.py

[Cleanup] Remove obsolete spec decoding compatibility logic (#32003 )

2026-01-09 05:44:18 +00:00

benchmark_prefix_block_hash.py

[Core] Add xxHash as a high-performance hash option for accelerating prefix caching (#29163 )

2025-12-03 16:06:57 +00:00

benchmark_prefix_caching.py

[Chore] Move tokenizer initialization methods (#29793 )

2025-12-02 13:33:37 +08:00

benchmark_prioritization.py

[Chore]:Extract math and argparse utilities to separate modules (#27188 )

2025-10-26 04:03:32 -07:00

benchmark_serving_structured_output.py

[Misc] Consistent case for vllm bench serve results (#30403 )

2025-12-10 09:44:02 -08:00

benchmark_serving.py

[CI/Build][Doc] Fully deprecate old bench scripts for serving / throughput / latency (#24411 )

2025-09-09 10:02:35 +00:00

benchmark_throughput.py

[CI/Build][Doc] Fully deprecate old bench scripts for serving / throughput / latency (#24411 )

2025-09-09 10:02:35 +00:00

benchmark_utils.py

Update Optional[x] -> x | None and Union[x, y] to x | y (#26633 )

2025-10-12 09:51:31 -07:00

README.md

[Docs] move benchmarks README to contributing guides (#24820 )

2025-09-16 05:52:57 -07:00

run_structured_output_benchmark.sh

[Benchmarks] Refactor run_structured_output_benchmarks.sh (#17722 )

2025-05-13 01:47:29 -07:00

sonnet.txt

feat(benchmarks): Add Prefix Caching Benchmark to Serving Benchmark (#3277 )

2024-03-27 13:39:26 -07:00

README.md

Benchmarks

This directory used to contain vLLM's benchmark scripts and utilities for performance testing and evaluation.

Serving benchmarks: Scripts for testing online inference performance (latency, throughput)
Throughput benchmarks: Scripts for testing offline batch inference performance
Specialized benchmarks: Tools for testing specific features like structured output, prefix caching, long document QA, request prioritization, and multi-modal inference
Dataset utilities: Framework for loading and sampling from various benchmark datasets (ShareGPT, HuggingFace datasets, synthetic data, etc.)

Usage

For detailed usage instructions, examples, and dataset information, see the Benchmark CLI documentation.

For full CLI reference see:

README.md

Benchmarks

Contents

Usage