[Doc] Improve GitHub links (#11491)

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
This commit is contained in:
Cyrus Leung
2024-12-26 06:49:26 +08:00
committed by GitHub
parent b689ada91e
commit 6ad909fdda
31 changed files with 147 additions and 136 deletions

View File

@@ -29,7 +29,7 @@ Please refer to the {ref}`installation documentation <installation>` for more de
## Offline Batched Inference
With vLLM installed, you can start generating texts for list of input prompts (i.e. offline batch inferencing). The example script for this section can be found [here](https://github.com/vllm-project/vllm/blob/main/examples/offline_inference.py).
With vLLM installed, you can start generating texts for list of input prompts (i.e. offline batch inferencing). See the example script: <gh-file:examples/offline_inference.py>
The first line of this example imports the classes {class}`~vllm.LLM` and {class}`~vllm.SamplingParams`:
@@ -87,7 +87,8 @@ $ vllm serve Qwen/Qwen2.5-1.5B-Instruct
```
```{note}
By default, the server uses a predefined chat template stored in the tokenizer. You can learn about overriding it [here](https://github.com/vllm-project/vllm/blob/main/docs/source/serving/openai_compatible_server.md#chat-template).
By default, the server uses a predefined chat template stored in the tokenizer.
You can learn about overriding it [here](#chat-template).
```
This server can be queried in the same format as OpenAI API. For example, to list the models:
@@ -130,7 +131,7 @@ completion = client.completions.create(model="Qwen/Qwen2.5-1.5B-Instruct",
print("Completion result:", completion)
```
A more detailed client example can be found [here](https://github.com/vllm-project/vllm/blob/main/examples/openai_completion_client.py).
A more detailed client example can be found here: <gh-file:examples/openai_completion_client.py>
### OpenAI Chat Completions API with vLLM