Files

Kyle Sayers 64ac1395e8 [Docs] Clean up speculators docs (#34065 )

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

2026-02-18 13:48:11 -08:00

803 B

Raw Blame History

N-Gram Speculation

The following code configures vLLM to use speculative decoding where proposals are generated by matching n-grams in the prompt. For more information read this thread.

from vllm import LLM, SamplingParams

prompts = ["The future of AI is"]
sampling_params = SamplingParams(temperature=0.8, top_p=0.95)

llm = LLM(
    model="Qwen/Qwen3-8B",
    tensor_parallel_size=1,
    speculative_config={
        "method": "ngram",
        "num_speculative_tokens": 5,
        "prompt_lookup_max": 4,
    },
)
outputs = llm.generate(prompts, sampling_params)

for output in outputs:
    prompt = output.prompt
    generated_text = output.outputs[0].text
    print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")

803 B Raw Blame History

N-Gram Speculation

803 B

Raw Blame History