[Misc][Doc] Add note regarding loading generation_config by default (#15281)

Signed-off-by: Roger Wang <ywang@roblox.com>
2025-03-23 14:00:55 -07:00
parent d6cd59f122
commit 9c5c81b0da
4 changed files with 27 additions and 1 deletions
--- a/docs/source/models/generative_models.md
+++ b/docs/source/models/generative_models.md
@@ -46,6 +46,11 @@ for output in outputs:
    print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")
 ```

+:::{important}
+By default, vLLM will use sampling parameters recommended by model creator by applying the `generation_config.json` from the huggingface model repository if it exists. In most cases, this will provide you with the best results by default if {class}`~vllm.SamplingParams` is not specified.
+
+However, if vLLM's default sampling parameters are preferred, please pass `generation_config="vllm"` when creating the {class}`~vllm.LLM` instance.
+:::
 A code example can be found here: <gh-file:examples/offline_inference/basic/basic.py>

 ### `LLM.beam_search`