Make distinct code and console admonitions so readers are less likely to miss them (#20585)

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-07-08 03:55:28 +01:00
parent 31c5d0a1b7
commit af107d5a0e
52 changed files with 192 additions and 162 deletions
--- a/docs/features/spec_decode.md
+++ b/docs/features/spec_decode.md
@@ -18,7 +18,7 @@ Speculative decoding is a technique which improves inter-token latency in memory

 The following code configures vLLM in an offline mode to use speculative decoding with a draft model, speculating 5 tokens at a time.

-??? Code
+??? code

    ```python
    from vllm import LLM, SamplingParams
@@ -62,7 +62,7 @@ python -m vllm.entrypoints.openai.api_server \

 Then use a client:

-??? Code
+??? code

    ```python
    from openai import OpenAI
@@ -103,7 +103,7 @@ Then use a client:
 The following code configures vLLM to use speculative decoding where proposals are generated by
 matching n-grams in the prompt. For more information read [this thread.](https://x.com/joao_gante/status/1747322413006643259)

-??? Code
+??? code

    ```python
    from vllm import LLM, SamplingParams
@@ -137,7 +137,7 @@ draft models that conditioning draft predictions on both context vectors and sam
 For more information see [this blog](https://pytorch.org/blog/hitchhikers-guide-speculative-decoding/) or
 [this technical report](https://arxiv.org/abs/2404.19124).

-??? Code
+??? code

    ```python
    from vllm import LLM, SamplingParams
@@ -185,7 +185,7 @@ A variety of speculative models of this type are available on HF hub:
 The following code configures vLLM to use speculative decoding where proposals are generated by
 an [EAGLE (Extrapolation Algorithm for Greater Language-model Efficiency)](https://arxiv.org/pdf/2401.15077) based draft model. A more detailed example for offline mode, including how to extract request level acceptance rate, can be found [here](gh-file:examples/offline_inference/eagle.py).

-??? Code
+??? code

    ```python
    from vllm import LLM, SamplingParams