docs/features/structured_outputs.md

# Structured Outputs

vLLM supports the generation of structured outputs using
[xgrammar](https://github.com/mlc-ai/xgrammar) or
[guidance](https://github.com/guidance-ai/llguidance) as backends.
This document shows you some examples of the different options that are
available to generate structured outputs.

!!! warning
    If you are still using the following deprecated API fields which were removed in v0.12.0, please update your code to use `structured_outputs` as demonstrated in the rest of this document:

    - `guided_json` -> `{"structured_outputs": {"json": ...}}` or `StructuredOutputsParams(json=...)`
    - `guided_regex` -> `{"structured_outputs": {"regex": ...}}` or `StructuredOutputsParams(regex=...)`
    - `guided_choice` -> `{"structured_outputs": {"choice": ...}}` or `StructuredOutputsParams(choice=...)`
    - `guided_grammar` -> `{"structured_outputs": {"grammar": ...}}` or `StructuredOutputsParams(grammar=...)`
    - `guided_whitespace_pattern` -> `{"structured_outputs": {"whitespace_pattern": ...}}` or `StructuredOutputsParams(whitespace_pattern=...)`
    - `structural_tag` -> `{"structured_outputs": {"structural_tag": ...}}` or `StructuredOutputsParams(structural_tag=...)`
    - `guided_decoding_backend` -> Remove this field from your request

## Online Serving (OpenAI API)

You can generate structured outputs using the OpenAI's [Completions](https://platform.openai.com/docs/api-reference/completions) and [Chat](https://platform.openai.com/docs/api-reference/chat) API.

The following parameters are supported, which must be added as extra parameters:

- `choice`: the output will be exactly one of the choices.
- `regex`: the output will follow the regex pattern.
- `json`: the output will follow the JSON schema.
- `grammar`: the output will follow the context free grammar.
- `structural_tag`: Follow a JSON schema within a set of specified tags within the generated text.

You can see the complete list of supported parameters on the [OpenAI-Compatible Server](../serving/openai_compatible_server.md) page.

Structured outputs are supported by default in the OpenAI-Compatible Server. You
may choose to specify the backend to use by setting the
`--structured-outputs-config.backend` flag to `vllm serve`. The default backend is `auto`,
which will try to choose an appropriate backend based on the details of the
request. You may also choose a specific backend, along with
some options. A full set of options is available in the `vllm serve --help`
text.

Now let´s see an example for each of the cases, starting with the `choice`, as it´s the easiest one:

??? code

    ```python
    from openai import OpenAI
    client = OpenAI(
        base_url="http://localhost:8000/v1",
        api_key="-",
    )
    model = client.models.list().data[0].id

    completion = client.chat.completions.create(
        model=model,
        messages=[
            {"role": "user", "content": "Classify this sentiment: vLLM is wonderful!"}
        ],
        extra_body={"structured_outputs": {"choice": ["positive", "negative"]}},
    )
    print(completion.choices[0].message.content)
    ```

The next example shows how to use the `regex`. The idea is to generate an email address, given a simple regex template:

??? code

    ```python
    completion = client.chat.completions.create(
        model=model,
        messages=[
            {
                "role": "user",
                "content": "Generate an example email address for Alan Turing, who works in Enigma. End in .com and new line. Example result: alan.turing@enigma.com\n",
            }
        ],
        extra_body={"structured_outputs": {"regex": r"\w+@\w+\.com\n"}, "stop": ["\n"]},
    )
    print(completion.choices[0].message.content)
    ```

One of the most relevant features in structured text generation is the option to generate a valid JSON with pre-defined fields and formats.
For this we can use the `json` parameter in two different ways:

- Using directly a [JSON Schema](https://json-schema.org/)
- Defining a [Pydantic model](https://docs.pydantic.dev/latest/) and then extracting the JSON Schema from it (which is normally an easier option).

The next example shows how to use the `response_format` parameter with a Pydantic model:

??? code

    ```python
    from pydantic import BaseModel
    from enum import Enum

    class CarType(str, Enum):
        sedan = "sedan"
        suv = "SUV"
        truck = "Truck"
        coupe = "Coupe"

    class CarDescription(BaseModel):
        brand: str
        model: str
        car_type: CarType

    json_schema = CarDescription.model_json_schema()

    completion = client.chat.completions.create(
        model=model,
        messages=[
            {
                "role": "user",
                "content": "Generate a JSON with the brand, model and car_type of the most iconic car from the 90's",
            }
        ],
        response_format={
            "type": "json_schema",
            "json_schema": {
                "name": "car-description",
                "schema": CarDescription.model_json_schema()
            },
        },
    )
    print(completion.choices[0].message.content)
    ```

!!! tip
    While not strictly necessary, normally it´s better to indicate in the prompt the
    JSON schema and how the fields should be populated. This can improve the
    results notably in most cases.

Finally we have the `grammar` option, which is probably the most
difficult to use, but it´s really powerful. It allows us to define complete
languages like SQL queries. It works by using a context free EBNF grammar.
As an example, we can use to define a specific format of simplified SQL queries:

??? code

    ```python
    simplified_sql_grammar = """
        root ::= select_statement

        select_statement ::= "SELECT " column " from " table " where " condition

        column ::= "col_1 " | "col_2 "

        table ::= "table_1 " | "table_2 "

        condition ::= column "= " number

        number ::= "1 " | "2 "
    """

    completion = client.chat.completions.create(
        model=model,
        messages=[
            {
                "role": "user",
                "content": "Generate an SQL query to show the 'username' and 'email' from the 'users' table.",
            }
        ],
        extra_body={"structured_outputs": {"grammar": simplified_sql_grammar}},
    )
    print(completion.choices[0].message.content)
    ```

See also: [full example](../examples/online_serving/structured_outputs.md)

## Reasoning Outputs

You can also use structured outputs with <project:#reasoning-outputs> for reasoning models.

```bash
vllm serve deepseek-ai/DeepSeek-R1-Distill-Qwen-7B --reasoning-parser deepseek_r1
```

Note that you can use reasoning with any provided structured outputs feature. The following uses one with JSON schema:

??? code

    ```python
    from pydantic import BaseModel


    class People(BaseModel):
        name: str
        age: int


    completion = client.chat.completions.create(
        model=model,
        messages=[
            {
                "role": "user",
                "content": "Generate a JSON with the name and age of one random person.",
            }
        ],
        response_format={
            "type": "json_schema",
            "json_schema": {
                "name": "people",
                "schema": People.model_json_schema()
            }
        },
    )
    print("reasoning: ", completion.choices[0].message.reasoning)
    print("content: ", completion.choices[0].message.content)
    ```

See also: [full example](../examples/online_serving/structured_outputs.md)

## Experimental Automatic Parsing (OpenAI API)

This section covers the OpenAI beta wrapper over the `client.chat.completions.create()` method that provides richer integrations with Python specific types.

At the time of writing (`openai==1.54.4`), this is a "beta" feature in the OpenAI client library. Code reference can be found [here](https://github.com/openai/openai-python/blob/52357cff50bee57ef442e94d78a0de38b4173fc2/src/openai/resources/beta/chat/completions.py#L100-L104).

For the following examples, vLLM was set up using `vllm serve meta-llama/Llama-3.1-8B-Instruct`

Here is a simple example demonstrating how to get structured output using Pydantic models:

??? code

    ```python
    from pydantic import BaseModel
    from openai import OpenAI

    class Info(BaseModel):
        name: str
        age: int

    client = OpenAI(base_url="http://0.0.0.0:8000/v1", api_key="dummy")
    model = client.models.list().data[0].id
    completion = client.beta.chat.completions.parse(
        model=model,
        messages=[
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": "My name is Cameron, I'm 28. What's my name and age?"},
        ],
        response_format=Info,
    )

    message = completion.choices[0].message
    print(message)
    assert message.parsed
    print("Name:", message.parsed.name)
    print("Age:", message.parsed.age)
    ```

```console
ParsedChatCompletionMessage[Testing](content='{"name": "Cameron", "age": 28}', refusal=None, role='assistant', audio=None, function_call=None, tool_calls=[], parsed=Testing(name='Cameron', age=28))
Name: Cameron
Age: 28
```

Here is a more complex example using nested Pydantic models to handle a step-by-step math solution:

??? code

    ```python
    from typing import List
    from pydantic import BaseModel
    from openai import OpenAI

    class Step(BaseModel):
        explanation: str
        output: str

    class MathResponse(BaseModel):
        steps: list[Step]
        final_answer: str

    completion = client.beta.chat.completions.parse(
        model=model,
        messages=[
            {"role": "system", "content": "You are a helpful expert math tutor."},
            {"role": "user", "content": "Solve 8x + 31 = 2."},
        ],
        response_format=MathResponse,
    )

    message = completion.choices[0].message
    print(message)
    assert message.parsed
    for i, step in enumerate(message.parsed.steps):
        print(f"Step #{i}:", step)
    print("Answer:", message.parsed.final_answer)
    ```

Output:

```console
ParsedChatCompletionMessage[MathResponse](content='{ "steps": [{ "explanation": "First, let\'s isolate the term with the variable \'x\'. To do this, we\'ll subtract 31 from both sides of the equation.", "output": "8x + 31 - 31 = 2 - 31"}, { "explanation": "By subtracting 31 from both sides, we simplify the equation to 8x = -29.", "output": "8x = -29"}, { "explanation": "Next, let\'s isolate \'x\' by dividing both sides of the equation by 8.", "output": "8x / 8 = -29 / 8"}], "final_answer": "x = -29/8" }', refusal=None, role='assistant', audio=None, function_call=None, tool_calls=[], parsed=MathResponse(steps=[Step(explanation="First, let's isolate the term with the variable 'x'. To do this, we'll subtract 31 from both sides of the equation.", output='8x + 31 - 31 = 2 - 31'), Step(explanation='By subtracting 31 from both sides, we simplify the equation to 8x = -29.', output='8x = -29'), Step(explanation="Next, let's isolate 'x' by dividing both sides of the equation by 8.", output='8x / 8 = -29 / 8')], final_answer='x = -29/8'))
Step #0: explanation="First, let's isolate the term with the variable 'x'. To do this, we'll subtract 31 from both sides of the equation." output='8x + 31 - 31 = 2 - 31'
Step #1: explanation='By subtracting 31 from both sides, we simplify the equation to 8x = -29.' output='8x = -29'
Step #2: explanation="Next, let's isolate 'x' by dividing both sides of the equation by 8." output='8x / 8 = -29 / 8'
Answer: x = -29/8
```

An example of using `structural_tag` can be found here: [examples/online_serving/structured_outputs](../../examples/online_serving/structured_outputs)

## Offline Inference

Offline inference allows for the same types of structured outputs.
To use it, we´ll need to configure the structured outputs using the class `StructuredOutputsParams` inside `SamplingParams`.
The main available options inside `StructuredOutputsParams` are:

- `json`
- `regex`
- `choice`
- `grammar`
- `structural_tag`

These parameters can be used in the same way as the parameters from the Online
Serving examples above. One example for the usage of the `choice` parameter is
shown below:

??? code

    ```python
    from vllm import LLM, SamplingParams
    from vllm.sampling_params import StructuredOutputsParams

    llm = LLM(model="HuggingFaceTB/SmolLM2-1.7B-Instruct")

    structured_outputs_params = StructuredOutputsParams(choice=["Positive", "Negative"])
    sampling_params = SamplingParams(structured_outputs=structured_outputs_params)
    outputs = llm.generate(
        prompts="Classify this sentiment: vLLM is wonderful!",
        sampling_params=sampling_params,
    )
    print(outputs[0].outputs[0].text)
    ```

See also: [full example](../examples/online_serving/structured_outputs.md)
-												Stop using title frontmatter and fix doc that can only be reached by search (#20623)

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
											
										
										
											2025-07-08 11:27:40 +01:00
+								# Structured Outputs
-												[Docs] Convert rST to MyST (Markdown) (#11145)

Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com>
											
										
										
											2024-12-23 17:35:38 -05:00
-												[Docs] Update structured output doc for V1 (#17135)

Signed-off-by: Russell Bryant <rbryant@redhat.com>
											
										
										
											2025-04-26 11:12:18 -04:00
+								vLLM supports the generation of structured outputs using
 								[xgrammar](https://github.com/mlc-ai/xgrammar) or
 								[guidance](https://github.com/guidance-ai/llguidance) as backends.
 								This document shows you some examples of the different options that are
 								available to generate structured outputs.
-												[Docs] Convert rST to MyST (Markdown) (#11145)

Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com>
											
										
										
											2024-12-23 17:35:38 -05:00
-												Add backward compatibility for `guided_...` API (#25615)

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
											
										
										
											2025-09-25 12:45:25 +01:00
+								!!! warning
-												Scheduled removal of `guided_*` config fields (#29326)

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
											
										
										
											2025-11-25 05:24:05 +00:00
+								    If you are still using the following deprecated API fields which were removed in v0.12.0, please update your code to use `structured_outputs` as demonstrated in the rest of this document:
-												Add backward compatibility for `guided_...` API (#25615)

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
											
										
										
											2025-09-25 12:45:25 +01:00
 								    - `guided_json` -> `{"structured_outputs": {"json": ...}}` or `StructuredOutputsParams(json=...)`
 								    - `guided_regex` -> `{"structured_outputs": {"regex": ...}}` or `StructuredOutputsParams(regex=...)`
 								    - `guided_choice` -> `{"structured_outputs": {"choice": ...}}` or `StructuredOutputsParams(choice=...)`
 								    - `guided_grammar` -> `{"structured_outputs": {"grammar": ...}}` or `StructuredOutputsParams(grammar=...)`
 								    - `guided_whitespace_pattern` -> `{"structured_outputs": {"whitespace_pattern": ...}}` or `StructuredOutputsParams(whitespace_pattern=...)`
 								    - `structural_tag` -> `{"structured_outputs": {"structural_tag": ...}}` or `StructuredOutputsParams(structural_tag=...)`
 								    - `guided_decoding_backend` -> Remove this field from your request
-												Replace "online inference" with "online serving" (#11923)

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
											
										
										
											2025-01-10 12:05:56 +00:00
+								## Online Serving (OpenAI API)
-												[Docs] Convert rST to MyST (Markdown) (#11145)

Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com>
											
										
										
											2024-12-23 17:35:38 -05:00
 								You can generate structured outputs using the OpenAI's [Completions](https://platform.openai.com/docs/api-reference/completions) and [Chat](https://platform.openai.com/docs/api-reference/chat) API.
 								The following parameters are supported, which must be added as extra parameters:
-												[Chore] Cleanup guided namespace, move to structured outputs config (#22772)

Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
											
										
										
											2025-09-18 05:20:27 -04:00
+								- `choice`: the output will be exactly one of the choices.
 								- `regex`: the output will follow the regex pattern.
 								- `json`: the output will follow the JSON schema.
 								- `grammar`: the output will follow the context free grammar.
-												[Docs] Update structured output doc for V1 (#17135)

Signed-off-by: Russell Bryant <rbryant@redhat.com>
											
										
										
											2025-04-26 11:12:18 -04:00
+								- `structural_tag`: Follow a JSON schema within a set of specified tags within the generated text.
-												[Docs] Convert rST to MyST (Markdown) (#11145)

Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com>
											
										
										
											2024-12-23 17:35:38 -05:00
-												Remove unnecessary explicit title anchors and use relative links instead (#20620)

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
											
										
										
											2025-07-08 10:49:13 +01:00
+								You can see the complete list of supported parameters on the [OpenAI-Compatible Server](../serving/openai_compatible_server.md) page.
-												[Docs] Update structured output doc for V1 (#17135)

Signed-off-by: Russell Bryant <rbryant@redhat.com>
											
										
										
											2025-04-26 11:12:18 -04:00
 								Structured outputs are supported by default in the OpenAI-Compatible Server. You
 								may choose to specify the backend to use by setting the
-												[Chore] Cleanup guided namespace, move to structured outputs config (#22772)

Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
											
										
										
											2025-09-18 05:20:27 -04:00
+								`--structured-outputs-config.backend` flag to `vllm serve`. The default backend is `auto`,
-												[Docs] Update structured output doc for V1 (#17135)

Signed-off-by: Russell Bryant <rbryant@redhat.com>
											
										
										
											2025-04-26 11:12:18 -04:00
+								which will try to choose an appropriate backend based on the details of the
 								request. You may also choose a specific backend, along with
 								some options. A full set of options is available in the `vllm serve --help`
 								text.
-												[Docs] Convert rST to MyST (Markdown) (#11145)

Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com>
											
										
										
											2024-12-23 17:35:38 -05:00
-												[Chore] Cleanup guided namespace, move to structured outputs config (#22772)

Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
											
										
										
											2025-09-18 05:20:27 -04:00
+								Now let´s see an example for each of the cases, starting with the `choice`, as it´s the easiest one:
-												[Docs] Convert rST to MyST (Markdown) (#11145)

Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com>
											
										
										
											2024-12-23 17:35:38 -05:00
-												Make distinct `code` and `console` admonitions so readers are less likely to miss them (#20585)

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
											
										
										
											2025-07-08 03:55:28 +01:00
+								??? code
-												[doc] Fold long code blocks to improve readability (#19926)

Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
											
										
										
											2025-06-23 13:24:23 +08:00
 								    ```python
 								    from openai import OpenAI
 								    client = OpenAI(
 								        base_url="http://localhost:8000/v1",
 								        api_key="-",
 								    )
 								    model = client.models.list().data[0].id
 								    completion = client.chat.completions.create(
 								        model=model,
 								        messages=[
 								            {"role": "user", "content": "Classify this sentiment: vLLM is wonderful!"}
 								        ],
-												[Chore] Cleanup guided namespace, move to structured outputs config (#22772)

Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
											
										
										
											2025-09-18 05:20:27 -04:00
+								        extra_body={"structured_outputs": {"choice": ["positive", "negative"]}},
-												[doc] Fold long code blocks to improve readability (#19926)

Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
											
										
										
											2025-06-23 13:24:23 +08:00
+								    )
 								    print(completion.choices[0].message.content)
 								    ```
-												[Docs] Convert rST to MyST (Markdown) (#11145)

Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com>
											
										
										
											2024-12-23 17:35:38 -05:00
-												[Chore] Cleanup guided namespace, move to structured outputs config (#22772)

Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
											
										
										
											2025-09-18 05:20:27 -04:00
+								The next example shows how to use the `regex`. The idea is to generate an email address, given a simple regex template:
-												[Docs] Convert rST to MyST (Markdown) (#11145)

Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com>
											
										
										
											2024-12-23 17:35:38 -05:00
-												Make distinct `code` and `console` admonitions so readers are less likely to miss them (#20585)

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
											
										
										
											2025-07-08 03:55:28 +01:00
+								??? code
-												[doc] Fold long code blocks to improve readability (#19926)

Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
											
										
										
											2025-06-23 13:24:23 +08:00
 								    ```python
 								    completion = client.chat.completions.create(
 								        model=model,
 								        messages=[
 								            {
 								                "role": "user",
 								                "content": "Generate an example email address for Alan Turing, who works in Enigma. End in .com and new line. Example result: alan.turing@enigma.com\n",
 								            }
 								        ],
-												[Chore] Cleanup guided namespace, move to structured outputs config (#22772)

Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
											
										
										
											2025-09-18 05:20:27 -04:00
+								        extra_body={"structured_outputs": {"regex": r"\w+@\w+\.com\n"}, "stop": ["\n"]},
-												[doc] Fold long code blocks to improve readability (#19926)

Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
											
										
										
											2025-06-23 13:24:23 +08:00
+								    )
 								    print(completion.choices[0].message.content)
 								    ```
-												[Docs] Convert rST to MyST (Markdown) (#11145)

Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com>
											
										
										
											2024-12-23 17:35:38 -05:00
 								One of the most relevant features in structured text generation is the option to generate a valid JSON with pre-defined fields and formats.
-												[Chore] Cleanup guided namespace, move to structured outputs config (#22772)

Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
											
										
										
											2025-09-18 05:20:27 -04:00
+								For this we can use the `json` parameter in two different ways:
-												[Docs] Convert rST to MyST (Markdown) (#11145)

Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com>
											
										
										
											2024-12-23 17:35:38 -05:00
 								- Using directly a [JSON Schema](https://json-schema.org/)
 								- Defining a [Pydantic model](https://docs.pydantic.dev/latest/) and then extracting the JSON Schema from it (which is normally an easier option).
-												[Chore] Cleanup guided namespace, move to structured outputs config (#22772)

Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
											
										
										
											2025-09-18 05:20:27 -04:00
+								The next example shows how to use the `response_format` parameter with a Pydantic model:
-												[Docs] Convert rST to MyST (Markdown) (#11145)

Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com>
											
										
										
											2024-12-23 17:35:38 -05:00
-												Make distinct `code` and `console` admonitions so readers are less likely to miss them (#20585)

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
											
										
										
											2025-07-08 03:55:28 +01:00
+								??? code
-												[doc] Fold long code blocks to improve readability (#19926)

Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
											
										
										
											2025-06-23 13:24:23 +08:00
 								    ```python
 								    from pydantic import BaseModel
 								    from enum import Enum
 								    class CarType(str, Enum):
 								        sedan = "sedan"
 								        suv = "SUV"
 								        truck = "Truck"
 								        coupe = "Coupe"
 								    class CarDescription(BaseModel):
 								        brand: str
 								        model: str
 								        car_type: CarType
 								    json_schema = CarDescription.model_json_schema()
 								    completion = client.chat.completions.create(
 								        model=model,
 								        messages=[
 								            {
 								                "role": "user",
 								                "content": "Generate a JSON with the brand, model and car_type of the most iconic car from the 90's",
 								            }
 								        ],
-												[Doc] Fix a syntax error of example code in structured_outputs.md (#22045)

Signed-off-by: wangzi <3220100013@zju.edu.cn>
Co-authored-by: wangzi <3220100013@zju.edu.cn>
											
										
										
											2025-08-01 15:01:22 +08:00
+								        response_format={
-												[doc] Fold long code blocks to improve readability (#19926)

Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
											
										
										
											2025-06-23 13:24:23 +08:00
+								            "type": "json_schema",
 								            "json_schema": {
 								                "name": "car-description",
 								                "schema": CarDescription.model_json_schema()
 								            },
-												[Doc] Unify structured outputs examples (#18196)

Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
											
										
										
											2025-06-12 18:50:31 -04:00
+								        },
-												[doc] Fold long code blocks to improve readability (#19926)

Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
											
										
										
											2025-06-23 13:24:23 +08:00
+								    )
 								    print(completion.choices[0].message.content)
 								    ```
-												[Docs] Convert rST to MyST (Markdown) (#11145)

Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com>
											
										
										
											2024-12-23 17:35:38 -05:00
-												Migrate docs from Sphinx to MkDocs (#18145)

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
											
										
										
											2025-05-23 11:09:53 +02:00
+								!!! tip
 								    While not strictly necessary, normally it´s better to indicate in the prompt the
-												[Doc] Unify structured outputs examples (#18196)

Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
											
										
										
											2025-06-12 18:50:31 -04:00
+								    JSON schema and how the fields should be populated. This can improve the
-												Migrate docs from Sphinx to MkDocs (#18145)

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
											
										
										
											2025-05-23 11:09:53 +02:00
+								    results notably in most cases.
-												[Docs] Convert rST to MyST (Markdown) (#11145)

Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com>
											
										
										
											2024-12-23 17:35:38 -05:00
-												[Chore] Cleanup guided namespace, move to structured outputs config (#22772)

Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
											
										
										
											2025-09-18 05:20:27 -04:00
+								Finally we have the `grammar` option, which is probably the most
-												[Docs] Update structured output doc for V1 (#17135)

Signed-off-by: Russell Bryant <rbryant@redhat.com>
											
										
										
											2025-04-26 11:12:18 -04:00
+								difficult to use, but it´s really powerful. It allows us to define complete
-												[Doc] Unify structured outputs examples (#18196)

Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
											
										
										
											2025-06-12 18:50:31 -04:00
+								languages like SQL queries. It works by using a context free EBNF grammar.
-												[Docs] Update structured output doc for V1 (#17135)

Signed-off-by: Russell Bryant <rbryant@redhat.com>
											
										
										
											2025-04-26 11:12:18 -04:00
+								As an example, we can use to define a specific format of simplified SQL queries:
-												[Docs] Convert rST to MyST (Markdown) (#11145)

Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com>
											
										
										
											2024-12-23 17:35:38 -05:00
-												Make distinct `code` and `console` admonitions so readers are less likely to miss them (#20585)

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
											
										
										
											2025-07-08 03:55:28 +01:00
+								??? code
-												[Docs] Convert rST to MyST (Markdown) (#11145)

Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com>
											
										
										
											2024-12-23 17:35:38 -05:00
-												[doc] Fold long code blocks to improve readability (#19926)

Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
											
										
										
											2025-06-23 13:24:23 +08:00
+								    ```python
 								    simplified_sql_grammar = """
 								        root ::= select_statement
-												[Docs] Convert rST to MyST (Markdown) (#11145)

Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com>
											
										
										
											2024-12-23 17:35:38 -05:00
-												[doc] Fold long code blocks to improve readability (#19926)

Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
											
										
										
											2025-06-23 13:24:23 +08:00
+								        select_statement ::= "SELECT " column " from " table " where " condition
-												[Docs] Convert rST to MyST (Markdown) (#11145)

Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com>
											
										
										
											2024-12-23 17:35:38 -05:00
-												[doc] Fold long code blocks to improve readability (#19926)

Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
											
										
										
											2025-06-23 13:24:23 +08:00
+								        column ::= "col_1 " | "col_2 "
-												[Docs] Convert rST to MyST (Markdown) (#11145)

Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com>
											
										
										
											2024-12-23 17:35:38 -05:00
-												[doc] Fold long code blocks to improve readability (#19926)

Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
											
										
										
											2025-06-23 13:24:23 +08:00
+								        table ::= "table_1 " | "table_2 "
-												[Docs] Convert rST to MyST (Markdown) (#11145)

Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com>
											
										
										
											2024-12-23 17:35:38 -05:00
-												[doc] Fold long code blocks to improve readability (#19926)

Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
											
										
										
											2025-06-23 13:24:23 +08:00
+								        condition ::= column "= " number
-												[Docs] Convert rST to MyST (Markdown) (#11145)

Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com>
											
										
										
											2024-12-23 17:35:38 -05:00
-												[doc] Fold long code blocks to improve readability (#19926)

Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
											
										
										
											2025-06-23 13:24:23 +08:00
+								        number ::= "1 " | "2 "
 								    """
 								    completion = client.chat.completions.create(
 								        model=model,
 								        messages=[
 								            {
 								                "role": "user",
 								                "content": "Generate an SQL query to show the 'username' and 'email' from the 'users' table.",
 								            }
 								        ],
-												[Chore] Cleanup guided namespace, move to structured outputs config (#22772)

Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
											
										
										
											2025-09-18 05:20:27 -04:00
+								        extra_body={"structured_outputs": {"grammar": simplified_sql_grammar}},
-												[doc] Fold long code blocks to improve readability (#19926)

Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
											
										
										
											2025-06-23 13:24:23 +08:00
+								    )
 								    print(completion.choices[0].message.content)
 								    ```
-												[Docs] Convert rST to MyST (Markdown) (#11145)

Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com>
											
										
										
											2024-12-23 17:35:38 -05:00
-												[Doc] Fix internal links so they don't always point to latest (#20563)

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
											
										
										
											2025-07-07 12:15:50 +01:00
+								See also: [full example](../examples/online_serving/structured_outputs.md)
-												[Doc] Unify structured outputs examples (#18196)

Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
											
										
										
											2025-06-12 18:50:31 -04:00
 								## Reasoning Outputs
 								You can also use structured outputs with <project:#reasoning-outputs> for reasoning models.
 								```bash
 								vllm serve deepseek-ai/DeepSeek-R1-Distill-Qwen-7B --reasoning-parser deepseek_r1
 								```
 								Note that you can use reasoning with any provided structured outputs feature. The following uses one with JSON schema:
-												Make distinct `code` and `console` admonitions so readers are less likely to miss them (#20585)

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
											
										
										
											2025-07-08 03:55:28 +01:00
+								??? code
-												[doc] Fold long code blocks to improve readability (#19926)

Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
											
										
										
											2025-06-23 13:24:23 +08:00
 								    ```python
 								    from pydantic import BaseModel
 								    class People(BaseModel):
 								        name: str
 								        age: int
 								    completion = client.chat.completions.create(
 								        model=model,
 								        messages=[
 								            {
 								                "role": "user",
 								                "content": "Generate a JSON with the name and age of one random person.",
 								            }
 								        ],
 								        response_format={
 								            "type": "json_schema",
 								            "json_schema": {
 								                "name": "people",
 								                "schema": People.model_json_schema()
 								            }
 								        },
 								    )
-												`reasoning_content` -> `reasoning` (#27752)

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
											
										
										
											2025-11-08 04:15:08 -08:00
+								    print("reasoning: ", completion.choices[0].message.reasoning)
-												[doc] Fold long code blocks to improve readability (#19926)

Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
											
										
										
											2025-06-23 13:24:23 +08:00
+								    print("content: ", completion.choices[0].message.content)
 								    ```
-												[Doc] Unify structured outputs examples (#18196)

Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
											
										
										
											2025-06-12 18:50:31 -04:00
-												[Doc] Fix internal links so they don't always point to latest (#20563)

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
											
										
										
											2025-07-07 12:15:50 +01:00
+								See also: [full example](../examples/online_serving/structured_outputs.md)
-												[Docs] Convert rST to MyST (Markdown) (#11145)

Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com>
											
										
										
											2024-12-23 17:35:38 -05:00
 								## Experimental Automatic Parsing (OpenAI API)
 								This section covers the OpenAI beta wrapper over the `client.chat.completions.create()` method that provides richer integrations with Python specific types.
 								At the time of writing (`openai==1.54.4`), this is a "beta" feature in the OpenAI client library. Code reference can be found [here](https://github.com/openai/openai-python/blob/52357cff50bee57ef442e94d78a0de38b4173fc2/src/openai/resources/beta/chat/completions.py#L100-L104).
-												[Doc]: fix typos in .md files (including those of #23751) (#23825)

Signed-off-by: Didier Durand <durand.didier@gmail.com>
											
										
										
											2025-08-28 13:38:19 +02:00
+								For the following examples, vLLM was set up using `vllm serve meta-llama/Llama-3.1-8B-Instruct`
-												[Docs] Convert rST to MyST (Markdown) (#11145)

Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com>
											
										
										
											2024-12-23 17:35:38 -05:00
 								Here is a simple example demonstrating how to get structured output using Pydantic models:
-												Make distinct `code` and `console` admonitions so readers are less likely to miss them (#20585)

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
											
										
										
											2025-07-08 03:55:28 +01:00
+								??? code
-												[doc] Fold long code blocks to improve readability (#19926)

Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
											
										
										
											2025-06-23 13:24:23 +08:00
 								    ```python
 								    from pydantic import BaseModel
 								    from openai import OpenAI
 								    class Info(BaseModel):
 								        name: str
 								        age: int
 								    client = OpenAI(base_url="http://0.0.0.0:8000/v1", api_key="dummy")
 								    model = client.models.list().data[0].id
 								    completion = client.beta.chat.completions.parse(
 								        model=model,
 								        messages=[
 								            {"role": "system", "content": "You are a helpful assistant."},
 								            {"role": "user", "content": "My name is Cameron, I'm 28. What's my name and age?"},
 								        ],
 								        response_format=Info,
 								    )
 								    message = completion.choices[0].message
 								    print(message)
 								    assert message.parsed
 								    print("Name:", message.parsed.name)
 								    print("Age:", message.parsed.age)
 								    ```
-												[Docs] Convert rST to MyST (Markdown) (#11145)

Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com>
											
										
										
											2024-12-23 17:35:38 -05:00
 								```console
 								ParsedChatCompletionMessage[Testing](content='{"name": "Cameron", "age": 28}', refusal=None, role='assistant', audio=None, function_call=None, tool_calls=[], parsed=Testing(name='Cameron', age=28))
 								Name: Cameron
 								Age: 28
 								```
 								Here is a more complex example using nested Pydantic models to handle a step-by-step math solution:
-												Make distinct `code` and `console` admonitions so readers are less likely to miss them (#20585)

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
											
										
										
											2025-07-08 03:55:28 +01:00
+								??? code
-												[doc] Fold long code blocks to improve readability (#19926)

Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
											
										
										
											2025-06-23 13:24:23 +08:00
 								    ```python
 								    from typing import List
 								    from pydantic import BaseModel
 								    from openai import OpenAI
 								    class Step(BaseModel):
 								        explanation: str
 								        output: str
 								    class MathResponse(BaseModel):
 								        steps: list[Step]
 								        final_answer: str
 								    completion = client.beta.chat.completions.parse(
 								        model=model,
 								        messages=[
 								            {"role": "system", "content": "You are a helpful expert math tutor."},
 								            {"role": "user", "content": "Solve 8x + 31 = 2."},
 								        ],
 								        response_format=MathResponse,
 								    )
 								    message = completion.choices[0].message
 								    print(message)
 								    assert message.parsed
 								    for i, step in enumerate(message.parsed.steps):
 								        print(f"Step #{i}:", step)
 								    print("Answer:", message.parsed.final_answer)
 								    ```
-												[Docs] Convert rST to MyST (Markdown) (#11145)

Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com>
											
										
										
											2024-12-23 17:35:38 -05:00
 								Output:
 								```console
 								ParsedChatCompletionMessage[MathResponse](content='{ "steps": [{ "explanation": "First, let\'s isolate the term with the variable \'x\'. To do this, we\'ll subtract 31 from both sides of the equation.", "output": "8x + 31 - 31 = 2 - 31"}, { "explanation": "By subtracting 31 from both sides, we simplify the equation to 8x = -29.", "output": "8x = -29"}, { "explanation": "Next, let\'s isolate \'x\' by dividing both sides of the equation by 8.", "output": "8x / 8 = -29 / 8"}], "final_answer": "x = -29/8" }', refusal=None, role='assistant', audio=None, function_call=None, tool_calls=[], parsed=MathResponse(steps=[Step(explanation="First, let's isolate the term with the variable 'x'. To do this, we'll subtract 31 from both sides of the equation.", output='8x + 31 - 31 = 2 - 31'), Step(explanation='By subtracting 31 from both sides, we simplify the equation to 8x = -29.', output='8x = -29'), Step(explanation="Next, let's isolate 'x' by dividing both sides of the equation by 8.", output='8x / 8 = -29 / 8')], final_answer='x = -29/8'))
 								Step #0: explanation="First, let's isolate the term with the variable 'x'. To do this, we'll subtract 31 from both sides of the equation." output='8x + 31 - 31 = 2 - 31'
 								Step #1: explanation='By subtracting 31 from both sides, we simplify the equation to 8x = -29.' output='8x = -29'
 								Step #2: explanation="Next, let's isolate 'x' by dividing both sides of the equation by 8." output='8x / 8 = -29 / 8'
 								Answer: x = -29/8
 								```
-												[Docs] Reduce custom syntax used in docs (#27009)

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
											
										
										
											2025-10-17 04:05:34 +01:00
+								An example of using `structural_tag` can be found here: [examples/online_serving/structured_outputs](../../examples/online_serving/structured_outputs)
-												[Docs] Update structured output doc for V1 (#17135)

Signed-off-by: Russell Bryant <rbryant@redhat.com>
											
										
										
											2025-04-26 11:12:18 -04:00
-												[Docs] Convert rST to MyST (Markdown) (#11145)

Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com>
											
										
										
											2024-12-23 17:35:38 -05:00
+								## Offline Inference
-												[Doc] Unify structured outputs examples (#18196)

Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
											
										
										
											2025-06-12 18:50:31 -04:00
+								Offline inference allows for the same types of structured outputs.
-												[Chore] Cleanup guided namespace, move to structured outputs config (#22772)

Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
											
										
										
											2025-09-18 05:20:27 -04:00
+								To use it, we´ll need to configure the structured outputs using the class `StructuredOutputsParams` inside `SamplingParams`.
 								The main available options inside `StructuredOutputsParams` are:
-												[Docs] Convert rST to MyST (Markdown) (#11145)

Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com>
											
										
										
											2024-12-23 17:35:38 -05:00
 								- `json`
 								- `regex`
 								- `choice`
 								- `grammar`
-												[Docs] Update structured output doc for V1 (#17135)

Signed-off-by: Russell Bryant <rbryant@redhat.com>
											
										
										
											2025-04-26 11:12:18 -04:00
+								- `structural_tag`
-												[Docs] Convert rST to MyST (Markdown) (#11145)

Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com>
											
										
										
											2024-12-23 17:35:38 -05:00
-												[Docs] Update structured output doc for V1 (#17135)

Signed-off-by: Russell Bryant <rbryant@redhat.com>
											
										
										
											2025-04-26 11:12:18 -04:00
+								These parameters can be used in the same way as the parameters from the Online
-												[Doc] Unify structured outputs examples (#18196)

Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
											
										
										
											2025-06-12 18:50:31 -04:00
+								Serving examples above. One example for the usage of the `choice` parameter is
-												[Docs] Update structured output doc for V1 (#17135)

Signed-off-by: Russell Bryant <rbryant@redhat.com>
											
										
										
											2025-04-26 11:12:18 -04:00
+								shown below:
-												[Docs] Convert rST to MyST (Markdown) (#11145)

Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com>
											
										
										
											2024-12-23 17:35:38 -05:00
-												Make distinct `code` and `console` admonitions so readers are less likely to miss them (#20585)

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
											
										
										
											2025-07-08 03:55:28 +01:00
+								??? code
-												[Docs] Convert rST to MyST (Markdown) (#11145)

Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com>
											
										
										
											2024-12-23 17:35:38 -05:00
-												[doc] Fold long code blocks to improve readability (#19926)

Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
											
										
										
											2025-06-23 13:24:23 +08:00
+								    ```python
 								    from vllm import LLM, SamplingParams
-												[Chore] Cleanup guided namespace, move to structured outputs config (#22772)

Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
											
										
										
											2025-09-18 05:20:27 -04:00
+								    from vllm.sampling_params import StructuredOutputsParams
-												[Docs] Convert rST to MyST (Markdown) (#11145)

Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com>
											
										
										
											2024-12-23 17:35:38 -05:00
-												[doc] Fold long code blocks to improve readability (#19926)

Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
											
										
										
											2025-06-23 13:24:23 +08:00
+								    llm = LLM(model="HuggingFaceTB/SmolLM2-1.7B-Instruct")
-												[Chore] Cleanup guided namespace, move to structured outputs config (#22772)

Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
											
										
										
											2025-09-18 05:20:27 -04:00
+								    structured_outputs_params = StructuredOutputsParams(choice=["Positive", "Negative"])
 								    sampling_params = SamplingParams(structured_outputs=structured_outputs_params)
-												[doc] Fold long code blocks to improve readability (#19926)

Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
											
										
										
											2025-06-23 13:24:23 +08:00
+								    outputs = llm.generate(
 								        prompts="Classify this sentiment: vLLM is wonderful!",
 								        sampling_params=sampling_params,
 								    )
 								    print(outputs[0].outputs[0].text)
 								    ```
-												[Docs] Convert rST to MyST (Markdown) (#11145)

Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com>
											
										
										
											2024-12-23 17:35:38 -05:00
-												[Doc] Fix internal links so they don't always point to latest (#20563)

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
											
										
										
											2025-07-07 12:15:50 +01:00
+								See also: [full example](../examples/online_serving/structured_outputs.md)