Replace "online inference" with "online serving" (#11923)

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
This commit is contained in:
Harry Mellor
2025-01-10 12:05:56 +00:00
committed by GitHub
parent ef725feafc
commit d85c47d6ad
11 changed files with 16 additions and 16 deletions

View File

@@ -5,7 +5,7 @@
vLLM supports the generation of structured outputs using [outlines](https://github.com/dottxt-ai/outlines), [lm-format-enforcer](https://github.com/noamgat/lm-format-enforcer), or [xgrammar](https://github.com/mlc-ai/xgrammar) as backends for the guided decoding.
This document shows you some examples of the different options that are available to generate structured outputs.
## Online Inference (OpenAI API)
## Online Serving (OpenAI API)
You can generate structured outputs using the OpenAI's [Completions](https://platform.openai.com/docs/api-reference/completions) and [Chat](https://platform.openai.com/docs/api-reference/chat) API.
@@ -239,7 +239,7 @@ The main available options inside `GuidedDecodingParams` are:
- `backend`
- `whitespace_pattern`
These parameters can be used in the same way as the parameters from the Online Inference examples above.
These parameters can be used in the same way as the parameters from the Online Serving examples above.
One example for the usage of the `choices` parameter is shown below:
```python