[Docs] Add comprehensive CLI reference for all large vllm subcommands (#22601)

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
This commit is contained in:
Harry Mellor
2025-08-11 08:13:33 +01:00
committed by GitHub
parent 1e55dfa7e5
commit bc1d02ac85
20 changed files with 205 additions and 110 deletions

View File

@@ -1,7 +1,3 @@
---
toc_depth: 4
---
# vLLM CLI Guide
The vllm command-line tool is used to run and manage vLLM models. You can start by viewing the help message with:
@@ -16,52 +12,48 @@ Available Commands:
vllm {chat,complete,serve,bench,collect-env,run-batch}
```
When passing JSON CLI arguments, the following sets of arguments are equivalent:
- `--json-arg '{"key1": "value1", "key2": {"key3": "value2"}}'`
- `--json-arg.key1 value1 --json-arg.key2.key3 value2`
Additionally, list elements can be passed individually using `+`:
- `--json-arg '{"key4": ["value3", "value4", "value5"]}'`
- `--json-arg.key4+ value3 --json-arg.key4+='value4,value5'`
## serve
Start the vLLM OpenAI Compatible API server.
Starts the vLLM OpenAI Compatible API server.
??? console "Examples"
Start with a model:
```bash
# Start with a model
vllm serve meta-llama/Llama-2-7b-hf
```bash
vllm serve meta-llama/Llama-2-7b-hf
```
# Specify the port
vllm serve meta-llama/Llama-2-7b-hf --port 8100
Specify the port:
# Serve over a Unix domain socket
vllm serve meta-llama/Llama-2-7b-hf --uds /tmp/vllm.sock
```bash
vllm serve meta-llama/Llama-2-7b-hf --port 8100
```
# Check with --help for more options
# To list all groups
vllm serve --help=listgroup
Serve over a Unix domain socket:
# To view a argument group
vllm serve --help=ModelConfig
```bash
vllm serve meta-llama/Llama-2-7b-hf --uds /tmp/vllm.sock
```
# To view a single argument
vllm serve --help=max-num-seqs
Check with --help for more options:
# To search by keyword
vllm serve --help=max
```bash
# To list all groups
vllm serve --help=listgroup
# To view full help with pager (less/more)
vllm serve --help=page
```
# To view a argument group
vllm serve --help=ModelConfig
### Options
# To view a single argument
vllm serve --help=max-num-seqs
--8<-- "docs/argparse/serve.md"
# To search by keyword
vllm serve --help=max
# To view full help with pager (less/more)
vllm serve --help=page
```
See [vllm serve](./serve.md) for the full reference of all available arguments.
## chat
@@ -78,6 +70,8 @@ vllm chat --url http://{vllm-serve-host}:{vllm-serve-port}/v1
vllm chat --quick "hi"
```
See [vllm chat](./chat.md) for the full reference of all available arguments.
## complete
Generate text completions based on the given prompt via the running API server.
@@ -93,7 +87,7 @@ vllm complete --url http://{vllm-serve-host}:{vllm-serve-port}/v1
vllm complete --quick "The future of AI is"
```
</details>
See [vllm complete](./complete.md) for the full reference of all available arguments.
## bench
@@ -120,6 +114,8 @@ vllm bench latency \
--load-format dummy
```
See [vllm bench latency](./bench/latency.md) for the full reference of all available arguments.
### serve
Benchmark the online serving throughput.
@@ -134,6 +130,8 @@ vllm bench serve \
--num-prompts 5
```
See [vllm bench serve](./bench/serve.md) for the full reference of all available arguments.
### throughput
Benchmark offline inference throughput.
@@ -147,6 +145,8 @@ vllm bench throughput \
--load-format dummy
```
See [vllm bench throughput](./bench/throughput.md) for the full reference of all available arguments.
## collect-env
Start collecting environment information.
@@ -159,24 +159,25 @@ vllm collect-env
Run batch prompts and write results to file.
<details>
<summary>Examples</summary>
Running with a local file:
```bash
# Running with a local file
vllm run-batch \
-i offline_inference/openai_batch/openai_example_batch.jsonl \
-o results.jsonl \
--model meta-llama/Meta-Llama-3-8B-Instruct
```
# Using remote file
Using remote file:
```bash
vllm run-batch \
-i https://raw.githubusercontent.com/vllm-project/vllm/main/examples/offline_inference/openai_batch/openai_example_batch.jsonl \
-o results.jsonl \
--model meta-llama/Meta-Llama-3-8B-Instruct
```
</details>
See [vllm run-batch](./run-batch.md) for the full reference of all available arguments.
## More Help