[Docs] Clean up speculators docs (#34065)

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
2026-02-18 16:48:11 -05:00
parent 61cf087680
commit 64ac1395e8
15 changed files with 936 additions and 342 deletions
--- a/docs/features/speculative_decoding/n_gram.md
+++ b/docs/features/speculative_decoding/n_gram.md
@@ -0,0 +1,27 @@
+# N-Gram Speculation
+
+The following code configures vLLM to use speculative decoding where proposals are generated by
+matching n-grams in the prompt. For more information read [this thread.](https://x.com/joao_gante/status/1747322413006643259)
+
+```python
+from vllm import LLM, SamplingParams
+
+prompts = ["The future of AI is"]
+sampling_params = SamplingParams(temperature=0.8, top_p=0.95)
+
+llm = LLM(
+    model="Qwen/Qwen3-8B",
+    tensor_parallel_size=1,
+    speculative_config={
+        "method": "ngram",
+        "num_speculative_tokens": 5,
+        "prompt_lookup_max": 4,
+    },
+)
+outputs = llm.generate(prompts, sampling_params)
+
+for output in outputs:
+    prompt = output.prompt
+    generated_text = output.outputs[0].text
+    print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")
+```