# Parallel Draft Models The following code configures vLLM to use speculative decoding where proposals are generated by [PARD](https://arxiv.org/pdf/2504.18583) (Parallel Draft Models). ## PARD Offline Mode Example ```python from vllm import LLM, SamplingParams prompts = ["The future of AI is"] sampling_params = SamplingParams(temperature=0.8, top_p=0.95) llm = LLM( model="Qwen/Qwen3-8B", tensor_parallel_size=1, speculative_config={ "model": "amd/PARD-Qwen3-0.6B", "num_speculative_tokens": 12, "method": "draft_model", "parallel_drafting": True, }, ) outputs = llm.generate(prompts, sampling_params) for output in outputs: prompt = output.prompt generated_text = output.outputs[0].text print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}") ``` ## PARD Online Mode Example ```bash vllm serve Qwen/Qwen3-4B \ --host 0.0.0.0 \ --port 8000 \ --seed 42 \ -tp 1 \ --max_model_len 2048 \ --gpu_memory_utilization 0.8 \ --speculative_config '{"model": "amd/PARD-Qwen3-0.6B", "num_speculative_tokens": 12, "method": "draft_model", "parallel_drafting": true}' ``` ## Pre-trained PARD weights - [amd/pard](https://huggingface.co/collections/amd/pard)