[Doc] Convert docs to use colon fences (#12471)

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
This commit is contained in:
Harry Mellor
2025-01-29 03:38:29 +00:00
committed by GitHub
parent a7e3eba66f
commit dd6a3a02cb
68 changed files with 2352 additions and 2341 deletions

View File

@@ -32,9 +32,9 @@ conda activate myenv
pip install vllm
```
```{note}
:::{note}
For non-CUDA platforms, please refer [here](#installation-index) for specific instructions on how to install vLLM.
```
:::
(quickstart-offline)=
@@ -69,9 +69,9 @@ The {class}`~vllm.LLM` class initializes vLLM's engine and the [OPT-125M model](
llm = LLM(model="facebook/opt-125m")
```
```{note}
:::{note}
By default, vLLM downloads models from [HuggingFace](https://huggingface.co/). If you would like to use models from [ModelScope](https://www.modelscope.cn), set the environment variable `VLLM_USE_MODELSCOPE` before initializing the engine.
```
:::
Now, the fun part! The outputs are generated using `llm.generate`. It adds the input prompts to the vLLM engine's waiting queue and executes the vLLM engine to generate the outputs with high throughput. The outputs are returned as a list of `RequestOutput` objects, which include all of the output tokens.
@@ -97,10 +97,10 @@ Run the following command to start the vLLM server with the [Qwen2.5-1.5B-Instru
vllm serve Qwen/Qwen2.5-1.5B-Instruct
```
```{note}
:::{note}
By default, the server uses a predefined chat template stored in the tokenizer.
You can learn about overriding it [here](#chat-template).
```
:::
This server can be queried in the same format as OpenAI API. For example, to list the models: