[Doc] Add MTP docs and update speculative decoding guidance (#35197)

Signed-off-by: liuxing <945764858@qq.com>
2026-03-05 01:23:34 +08:00
parent 28028dff2f
commit 7cc6058ac6
3 changed files with 79 additions and 4 deletions
--- a/docs/features/speculative_decoding/mlp.md
+++ b/docs/features/speculative_decoding/mlp.md
@@ -11,10 +11,10 @@ prompts = ["The future of AI is"]
 sampling_params = SamplingParams(temperature=0.8, top_p=0.95)

 llm = LLM(
-    model="meta-llama/Meta-Llama-3.1-70B-Instruct",
-    tensor_parallel_size=4,
+    model="meta-llama/Meta-Llama-3.1-8B-Instruct",
+    tensor_parallel_size=1,
    speculative_config={
-        "model": "ibm-ai-platform/llama3-70b-accelerator",
+        "model": "ibm-ai-platform/llama3-8b-accelerator",
        "draft_tensor_parallel_size": 1,
        "method": "mlp_speculator",
    },
@@ -27,6 +27,12 @@ for output in outputs:
    print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")
 ```

+!!! warning "Known issue"
+    `ibm-ai-platform/llama3-70b-accelerator` can fail with:
+    `AttributeError: 'MLPSpeculatorConfig' object has no attribute 'num_attention_heads'`.
+    Track status in [#34106](https://github.com/vllm-project/vllm/issues/34106)
+    and [#34163](https://github.com/vllm-project/vllm/pull/34163).
+
 ## Pre-Trained MLP Drafter Models

 A variety of speculative models of this type are available on HF hub: