Files

Vincent Gimenes 0b53bec60b [DOC]: Add warning about max_num_batched_tokens and max_model_len when chunked prefill is disabled (#33109 )

Signed-off-by: Vincent Gimenes <147169146+VincentG1234@users.noreply.github.com>

2026-01-27 03:05:02 +00:00

conserving_memory.md

2025-11-23 10:58:48 +08:00

engine_args.md

2025-12-26 12:47:41 +00:00

env_vars.md

2025-11-19 03:32:04 -08:00

model_resolution.md

2025-07-21 12:18:33 +01:00

optimization.md

2026-01-27 03:05:02 +00:00

README.md

2025-10-17 02:22:06 -07:00

serve_args.md

2025-11-15 05:33:27 -08:00

Configuration Options

This section lists the most common options for running vLLM.

There are three main levels of configuration, from highest priority to lowest priority: