[Doc][CI/Build] Update docs and tests to use vllm serve (#6431)

2024-07-17 15:43:21 +08:00
parent a19e8d3726
commit 5bf35a91e4
23 changed files with 155 additions and 175 deletions
--- a/docs/source/getting_started/quickstart.rst
+++ b/docs/source/getting_started/quickstart.rst
@@ -73,16 +73,13 @@ Start the server:

 .. code-block:: console

-    $ python -m vllm.entrypoints.openai.api_server \
-    $     --model facebook/opt-125m
+    $ vllm serve facebook/opt-125m

 By default, the server uses a predefined chat template stored in the tokenizer. You can override this template by using the ``--chat-template`` argument:

 .. code-block:: console

-   $ python -m vllm.entrypoints.openai.api_server \
-   $     --model facebook/opt-125m \
-   $     --chat-template ./examples/template_chatml.jinja
+    $ vllm serve facebook/opt-125m --chat-template ./examples/template_chatml.jinja

 This server can be queried in the same format as OpenAI API. For example, list the models: