[Doc] Add more tips to avoid OOM (#16765)

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-04-17 17:54:34 +08:00
parent a6481525b8
commit 61a44a0b22
2 changed files with 33 additions and 0 deletions
--- a/docs/source/serving/openai_compatible_server.md
+++ b/docs/source/serving/openai_compatible_server.md
@@ -33,11 +33,13 @@ print(completion.choices[0].message)
 vLLM supports some parameters that are not supported by OpenAI, `top_k` for example.
 You can pass these parameters to vLLM using the OpenAI client in the `extra_body` parameter of your requests, i.e. `extra_body={"top_k": 50}` for `top_k`.
 :::
+
 :::{important}
 By default, the server applies `generation_config.json` from the Hugging Face model repository if it exists. This means the default values of certain sampling parameters can be overridden by those recommended by the model creator.

 To disable this behavior, please pass `--generation-config vllm` when launching the server.
 :::
+
 ## Supported APIs

 We currently support the following OpenAI APIs:
@@ -172,6 +174,12 @@ print(completion._request_id)

 The `vllm serve` command is used to launch the OpenAI-compatible server.

+:::{tip}
+The vast majority of command-line arguments are based on those for offline inference.
+
+See [here](configuration-options) for some common options.
+:::
+
 :::{argparse}
 :module: vllm.entrypoints.openai.cli_args
 :func: create_parser_for_docs