[Docs] Add comprehensive CLI reference for all large vllm subcommands (#22601)

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-08-11 08:13:33 +01:00
parent 1e55dfa7e5
commit bc1d02ac85
20 changed files with 205 additions and 110 deletions
--- a/docs/cli/README.md
+++ b/docs/cli/README.md
@@ -1,7 +1,3 @@
---
-toc_depth: 4
---
-
 # vLLM CLI Guide

 The vllm command-line tool is used to run and manage vLLM models. You can start by viewing the help message with:
@@ -16,52 +12,48 @@ Available Commands:
 vllm {chat,complete,serve,bench,collect-env,run-batch}
 ```

-When passing JSON CLI arguments, the following sets of arguments are equivalent:
-
- `--json-arg '{"key1": "value1", "key2": {"key3": "value2"}}'`
- `--json-arg.key1 value1 --json-arg.key2.key3 value2`
-
-Additionally, list elements can be passed individually using `+`:
-
- `--json-arg '{"key4": ["value3", "value4", "value5"]}'`
- `--json-arg.key4+ value3 --json-arg.key4+='value4,value5'`
-
 ## serve

-Start the vLLM OpenAI Compatible API server.
+Starts the vLLM OpenAI Compatible API server.

-??? console "Examples"
+Start with a model:

-    ```bash
-    # Start with a model
-    vllm serve meta-llama/Llama-2-7b-hf
+```bash
+vllm serve meta-llama/Llama-2-7b-hf
+```

-    # Specify the port
-    vllm serve meta-llama/Llama-2-7b-hf --port 8100
+Specify the port:

-    # Serve over a Unix domain socket
-    vllm serve meta-llama/Llama-2-7b-hf --uds /tmp/vllm.sock
+```bash
+vllm serve meta-llama/Llama-2-7b-hf --port 8100
+```

-    # Check with --help for more options
-    # To list all groups
-    vllm serve --help=listgroup
+Serve over a Unix domain socket:

-    # To view a argument group
-    vllm serve --help=ModelConfig
+```bash
+vllm serve meta-llama/Llama-2-7b-hf --uds /tmp/vllm.sock
+```

-    # To view a single argument
-    vllm serve --help=max-num-seqs
+Check with --help for more options:

-    # To search by keyword
-    vllm serve --help=max
+```bash
+# To list all groups
+vllm serve --help=listgroup

-    # To view full help with pager (less/more)
-    vllm serve --help=page
-    ```
+# To view a argument group
+vllm serve --help=ModelConfig

-### Options
+# To view a single argument
+vllm serve --help=max-num-seqs

--8<-- "docs/argparse/serve.md"
+# To search by keyword
+vllm serve --help=max
+
+# To view full help with pager (less/more)
+vllm serve --help=page
+```
+
+See [vllm serve](./serve.md) for the full reference of all available arguments.

 ## chat

@@ -78,6 +70,8 @@ vllm chat --url http://{vllm-serve-host}:{vllm-serve-port}/v1
 vllm chat --quick "hi"
 ```

+See [vllm chat](./chat.md) for the full reference of all available arguments.
+
 ## complete

 Generate text completions based on the given prompt via the running API server.
@@ -93,7 +87,7 @@ vllm complete --url http://{vllm-serve-host}:{vllm-serve-port}/v1
 vllm complete --quick "The future of AI is"
 ```

-</details>
+See [vllm complete](./complete.md) for the full reference of all available arguments.

 ## bench

@@ -120,6 +114,8 @@ vllm bench latency \
    --load-format dummy
 ```

+See [vllm bench latency](./bench/latency.md) for the full reference of all available arguments.
+
 ### serve

 Benchmark the online serving throughput.
@@ -134,6 +130,8 @@ vllm bench serve \
    --num-prompts  5
 ```

+See [vllm bench serve](./bench/serve.md) for the full reference of all available arguments.
+
 ### throughput

 Benchmark offline inference throughput.
@@ -147,6 +145,8 @@ vllm bench throughput \
    --load-format dummy
 ```

+See [vllm bench throughput](./bench/throughput.md) for the full reference of all available arguments.
+
 ## collect-env

 Start collecting environment information.
@@ -159,24 +159,25 @@ vllm collect-env

 Run batch prompts and write results to file.

-<details>
-<summary>Examples</summary>
+Running with a local file:

 ```bash
-# Running with a local file
 vllm run-batch \
    -i offline_inference/openai_batch/openai_example_batch.jsonl \
    -o results.jsonl \
    --model meta-llama/Meta-Llama-3-8B-Instruct
+```

-# Using remote file
+Using remote file:
+
+```bash
 vllm run-batch \
    -i https://raw.githubusercontent.com/vllm-project/vllm/main/examples/offline_inference/openai_batch/openai_example_batch.jsonl \
    -o results.jsonl \
    --model meta-llama/Meta-Llama-3-8B-Instruct
 ```

-</details>
+See [vllm run-batch](./run-batch.md) for the full reference of all available arguments.

 ## More Help