[Frontend] generation_config.json for maximum tokens(#12242)

Signed-off-by: Matthew Hendrey <matthew.hendrey@gmail.com> Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com> Signed-off-by: youkaichao <youkaichao@gmail.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Signed-off-by: Yuan Tang <terrytangyuan@gmail.com> Signed-off-by: Isotr0py <2037008807@qq.com> Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: Chen Zhang <zhangch99@outlook.com> Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com> Co-authored-by: shangmingc <caishangming@linux.alibaba.com> Co-authored-by: youkaichao <youkaichao@gmail.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Yuan Tang <terrytangyuan@gmail.com> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk> Co-authored-by: Chen Zhang <zhangch99@outlook.com> Co-authored-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-01-26 06:59:25 -05:00
parent a5255270c3
commit 9ddc35220b
4 changed files with 145 additions and 9 deletions
--- a/vllm/engine/arg_utils.py
+++ b/vllm/engine/arg_utils.py
@@ -939,7 +939,9 @@ class EngineArgs:
            "Defaults to None, will use the default generation config in vLLM. "
            "If set to 'auto', the generation config will be automatically "
            "loaded from model. If set to a folder path, the generation config "
-            "will be loaded from the specified folder path.")
+            "will be loaded from the specified folder path. If "
+            "`max_new_tokens` is specified, then it sets a server-wide limit "
+            "on the number of output tokens for all requests.")

        parser.add_argument("--enable-sleep-mode",
                            action="store_true",