[Doc] Add more tips to avoid OOM (#16765)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
This commit is contained in:
@@ -33,11 +33,13 @@ print(completion.choices[0].message)
|
||||
vLLM supports some parameters that are not supported by OpenAI, `top_k` for example.
|
||||
You can pass these parameters to vLLM using the OpenAI client in the `extra_body` parameter of your requests, i.e. `extra_body={"top_k": 50}` for `top_k`.
|
||||
:::
|
||||
|
||||
:::{important}
|
||||
By default, the server applies `generation_config.json` from the Hugging Face model repository if it exists. This means the default values of certain sampling parameters can be overridden by those recommended by the model creator.
|
||||
|
||||
To disable this behavior, please pass `--generation-config vllm` when launching the server.
|
||||
:::
|
||||
|
||||
## Supported APIs
|
||||
|
||||
We currently support the following OpenAI APIs:
|
||||
@@ -172,6 +174,12 @@ print(completion._request_id)
|
||||
|
||||
The `vllm serve` command is used to launch the OpenAI-compatible server.
|
||||
|
||||
:::{tip}
|
||||
The vast majority of command-line arguments are based on those for offline inference.
|
||||
|
||||
See [here](configuration-options) for some common options.
|
||||
:::
|
||||
|
||||
:::{argparse}
|
||||
:module: vllm.entrypoints.openai.cli_args
|
||||
:func: create_parser_for_docs
|
||||
|
||||
Reference in New Issue
Block a user