Support download models from www.modelscope.cn (#1588)

2023-11-18 12:38:31 +08:00
parent bb00f66e19
commit edb305584b
4 changed files with 58 additions and 4 deletions
--- a/docs/source/getting_started/quickstart.rst
+++ b/docs/source/getting_started/quickstart.rst
@@ -40,6 +40,16 @@ Initialize vLLM's engine for offline inference with the ``LLM`` class and the `O

    llm = LLM(model="facebook/opt-125m")

+Use model from www.modelscope.cn
+
+.. code-block:: shell
+
+    export VLLM_USE_MODELSCOPE=True
+
+.. code-block:: python
+
+    llm = LLM(model="qwen/Qwen-7B-Chat", revision="v1.1.8", trust_remote_code=True)
+
 Call ``llm.generate`` to generate the outputs. It adds the input prompts to vLLM engine's waiting queue and executes the vLLM engine to generate the outputs with high throughput. The outputs are returned as a list of ``RequestOutput`` objects, which include all the output tokens.

 .. code-block:: python
@@ -67,6 +77,16 @@ Start the server:

    $ python -m vllm.entrypoints.api_server

+Use model from www.modelscope.cn
+
+.. code-block:: console
+
+    $ VLLM_USE_MODELSCOPE=True python -m vllm.entrypoints.api_server \
+    $    --model="qwen/Qwen-7B-Chat" \
+    $    --revision="v1.1.8" \
+    $    --trust-remote-code
+
+
 By default, this command starts the server at ``http://localhost:8000`` with the OPT-125M model.

 Query the model in shell:
@@ -95,6 +115,13 @@ Start the server:
    $ python -m vllm.entrypoints.openai.api_server \
    $     --model facebook/opt-125m

+Use model from www.modelscope.cn
+
+.. code-block:: console
+
+    $ VLLM_USE_MODELSCOPE=True python -m vllm.entrypoints.openai.api_server \
+    $     --model="qwen/Qwen-7B-Chat" --revision="v1.1.8" --trust-remote-code
+
 By default, it starts the server at ``http://localhost:8000``. You can specify the address with ``--host`` and ``--port`` arguments. The server currently hosts one model at a time (OPT-125M in the above command) and implements `list models <https://platform.openai.com/docs/api-reference/models/list>`_ and `create completion <https://platform.openai.com/docs/api-reference/completions/create>`_ endpoints. We are actively adding support for more endpoints.

 This server can be queried in the same format as OpenAI API. For example, list the models: