[Doc] Add MTP docs and update speculative decoding guidance (#35197)

Signed-off-by: liuxing <945764858@qq.com>
2026-03-05 01:23:34 +08:00
parent 28028dff2f
commit 7cc6058ac6
3 changed files with 79 additions and 4 deletions
--- a/docs/features/speculative_decoding/mtp.md
+++ b/docs/features/speculative_decoding/mtp.md
@@ -0,0 +1,50 @@
+# MTP (Multi-Token Prediction)
+
+MTP is a speculative decoding method where the target model includes native
+multi-token prediction capability. Unlike draft-model-based methods, you do not
+need to provide a separate draft model.
+
+MTP is useful when:
+
+- Your model natively supports MTP.
+- You want model-based speculative decoding with minimal extra configuration.
+
+## Offline Example
+
+```python
+from vllm import LLM, SamplingParams
+
+prompts = ["The future of AI is"]
+sampling_params = SamplingParams(temperature=0.8, top_p=0.95)
+
+llm = LLM(
+    model="XiaomiMiMo/MiMo-7B-Base",
+    tensor_parallel_size=1,
+    speculative_config={
+        "method": "mtp",
+        "num_speculative_tokens": 1,
+    },
+)
+outputs = llm.generate(prompts, sampling_params)
+
+for output in outputs:
+    prompt = output.prompt
+    generated_text = output.outputs[0].text
+    print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")
+```
+
+## Online Example
+
+```bash
+vllm serve XiaomiMiMo/MiMo-7B-Base \
+    --tensor-parallel-size 1 \
+    --speculative_config '{"method":"mtp","num_speculative_tokens":1}'
+```
+
+## Notes
+
+- MTP only works for model families that support MTP in vLLM.
+- `num_speculative_tokens` controls speculative depth. A small value like `1`
+  is a good default to start with.
+- If your model does not support MTP, use another method such as EAGLE or draft
+  model speculation.