docs/features/speculative_decoding/n_gram.md

# N-Gram Speculation

The following code configures vLLM to use speculative decoding where proposals are generated by
matching n-grams in the prompt. For more information read [this thread.](https://x.com/joao_gante/status/1747322413006643259)

```python
from vllm import LLM, SamplingParams

prompts = ["The future of AI is"]
sampling_params = SamplingParams(temperature=0.8, top_p=0.95)

llm = LLM(
    model="Qwen/Qwen3-8B",
    tensor_parallel_size=1,
    speculative_config={
        "method": "ngram",
        "num_speculative_tokens": 5,
        "prompt_lookup_max": 4,
    },
)
outputs = llm.generate(prompts, sampling_params)

for output in outputs:
    prompt = output.prompt
    generated_text = output.outputs[0].text
    print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")
```
[Docs] Clean up speculators docs (#34065) Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> 2026-02-18 16:48:11 -05:00			`# N-Gram Speculation`

			`The following code configures vLLM to use speculative decoding where proposals are generated by`
			`matching n-grams in the prompt. For more information read [this thread.](https://x.com/joao_gante/status/1747322413006643259)`

			```python
			`from vllm import LLM, SamplingParams`

			`prompts = ["The future of AI is"]`
			`sampling_params = SamplingParams(temperature=0.8, top_p=0.95)`

			`llm = LLM(`
			`model="Qwen/Qwen3-8B",`
			`tensor_parallel_size=1,`
			`speculative_config={`
			`"method": "ngram",`
			`"num_speculative_tokens": 5,`
			`"prompt_lookup_max": 4,`
			`},`
			`)`
			`outputs = llm.generate(prompts, sampling_params)`

			`for output in outputs:`
			`prompt = output.prompt`
			`generated_text = output.outputs[0].text`
			`print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")`
			```