Replace "online inference" with "online serving" (#11923)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
This commit is contained in:
@@ -118,7 +118,7 @@ print("Loaded chat template:", custom_template)
|
||||
outputs = llm.chat(conversation, chat_template=custom_template)
|
||||
```
|
||||
|
||||
## Online Inference
|
||||
## Online Serving
|
||||
|
||||
Our [OpenAI-Compatible Server](#openai-compatible-server) provides endpoints that correspond to the offline APIs:
|
||||
|
||||
|
||||
@@ -127,7 +127,7 @@ print(f"Score: {score}")
|
||||
|
||||
A code example can be found here: <gh-file:examples/offline_inference/offline_inference_scoring.py>
|
||||
|
||||
## Online Inference
|
||||
## Online Serving
|
||||
|
||||
Our [OpenAI-Compatible Server](#openai-compatible-server) provides endpoints that correspond to the offline APIs:
|
||||
|
||||
|
||||
@@ -552,7 +552,7 @@ See [this page](#multimodal-inputs) on how to pass multi-modal inputs to the mod
|
||||
|
||||
````{important}
|
||||
To enable multiple multi-modal items per text prompt, you have to set `limit_mm_per_prompt` (offline inference)
|
||||
or `--limit-mm-per-prompt` (online inference). For example, to enable passing up to 4 images per text prompt:
|
||||
or `--limit-mm-per-prompt` (online serving). For example, to enable passing up to 4 images per text prompt:
|
||||
|
||||
Offline inference:
|
||||
```python
|
||||
@@ -562,7 +562,7 @@ llm = LLM(
|
||||
)
|
||||
```
|
||||
|
||||
Online inference:
|
||||
Online serving:
|
||||
```bash
|
||||
vllm serve Qwen/Qwen2-VL-7B-Instruct --limit-mm-per-prompt image=4
|
||||
```
|
||||
|
||||
Reference in New Issue
Block a user