Make distinct code and console admonitions so readers are less likely to miss them (#20585)

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-07-08 03:55:28 +01:00
parent 31c5d0a1b7
commit af107d5a0e
52 changed files with 192 additions and 162 deletions
--- a/docs/features/multimodal_inputs.md
+++ b/docs/features/multimodal_inputs.md
@@ -20,7 +20,7 @@ To input multi-modal data, follow this schema in [vllm.inputs.PromptType][]:

 You can pass a single image to the `'image'` field of the multi-modal dictionary, as shown in the following examples:

-??? Code
+??? code

    ```python
    from vllm import LLM
@@ -68,7 +68,7 @@ Full example: <gh-file:examples/offline_inference/vision_language.py>

 To substitute multiple images inside the same text prompt, you can pass in a list of images instead:

-??? Code
+??? code

    ```python
    from vllm import LLM
@@ -146,7 +146,7 @@ for o in outputs:

 Multi-image input can be extended to perform video captioning. We show this with [Qwen2-VL](https://huggingface.co/Qwen/Qwen2-VL-2B-Instruct) as it supports videos:

-??? Code
+??? code

    ```python
    from vllm import LLM
@@ -193,7 +193,7 @@ Full example: <gh-file:examples/offline_inference/audio_language.py>
 To input pre-computed embeddings belonging to a data type (i.e. image, video, or audio) directly to the language model,
 pass a tensor of shape `(num_items, feature_size, hidden_size of LM)` to the corresponding field of the multi-modal dictionary.

-??? Code
+??? code

    ```python
    from vllm import LLM
@@ -220,7 +220,7 @@ pass a tensor of shape `(num_items, feature_size, hidden_size of LM)` to the cor

 For Qwen2-VL and MiniCPM-V, we accept additional parameters alongside the embeddings:

-??? Code
+??? code

    ```python
    # Construct the prompt based on your model
@@ -288,7 +288,7 @@ vllm serve microsoft/Phi-3.5-vision-instruct --task generate \

 Then, you can use the OpenAI client as follows:

-??? Code
+??? code

    ```python
    from openai import OpenAI
@@ -366,7 +366,7 @@ vllm serve llava-hf/llava-onevision-qwen2-0.5b-ov-hf --task generate --max-model

 Then, you can use the OpenAI client as follows:

-??? Code
+??? code

    ```python
    from openai import OpenAI
@@ -430,7 +430,7 @@ vllm serve fixie-ai/ultravox-v0_5-llama-3_2-1b

 Then, you can use the OpenAI client as follows:

-??? Code
+??? code

    ```python
    import base64
@@ -486,7 +486,7 @@ Then, you can use the OpenAI client as follows:

 Alternatively, you can pass `audio_url`, which is the audio counterpart of `image_url` for image input:

-??? Code
+??? code

    ```python
    chat_completion_from_url = client.chat.completions.create(
@@ -531,7 +531,7 @@ pass a tensor of shape to the corresponding field of the multi-modal dictionary.
 For image embeddings, you can pass the base64-encoded tensor to the `image_embeds` field.
 The following example demonstrates how to pass image embeddings to the OpenAI server:

-??? Code
+??? code

    ```python
    image_embedding = torch.load(...)