README.md

# vLLM Kimi-K2.5-Thinking Eagle3 Drafter

A convenience Docker image that bundles the [Eagle3 drafter model](https://huggingface.co/nvidia/Kimi-K2.5-Thinking-Eagle3) into the vLLM container, so you can deploy speculative decoding without a separate model download step.

## What's Inside

- **Base image:** `vllm/vllm-openai:v0.19.0`
- **Drafter model:** `nvidia/Kimi-K2.5-Thinking-Eagle3` (Eagle3 speculator layers) extracted to `/opt/`

> **Note:** This only works with `nvidia/Kimi-K2-Thinking-NVFP4` — the text generation model. It is **not** compatible with the multimodal Kimi 2.5.

## Pull

```bash
docker pull atl.vultrcr.com/vllm/vllm-kimi25-eagle:v0.19.0
```

## Usage

Add the speculative decoding config to your vLLM launch args. Here's a known-working Kubernetes deployment snippet:

```yaml
- "--tensor-parallel-size=8"
- "--trust-remote-code"
- "--gpu-memory-utilization=0.92"
- "--enable-auto-tool-choice"
- "--tool-call-parser=kimi_k2"
- "--reasoning-parser=kimi_k2"
- "--speculative_config"
- '{"model": "/opt/nvidia-Kimi-K2.5-Thinking-Eagle3/models--nvidia--Kimi-K2.5-Thinking-Eagle3/snapshots/13dab2a34d650a93196d37f2af91f74b8b855bab", "draft_tensor_parallel_size": 1, "num_speculative_tokens": 3, "method": "eagle3"}'
```

### Speculative Config Breakdown

| Parameter | Value | Notes |
|---|---|---|
| `model` | `/opt/nvidia-Kimi-K2.5-Thinking-Eagle3/...` | Path to the drafter inside the container |
| `draft_tensor_parallel_size` | `1` | TP size for the drafter |
| `num_speculative_tokens` | `3` | Number of tokens to speculate per step |
| `method` | `eagle3` | Speculative decoding method |

## Building

The Jenkins pipeline builds and pushes this image. Trigger a build with a specific tag:

```bash
curl -X POST "https://jenkins.sweetapi.com/job/vllm-kimi25-eagle/buildWithParameters" \
  -u "$JENKINS_USER:$JENKINS_PASS" \
  -d "TAG=v0.19.0"
```

To build locally:

```bash
docker build -t atl.vultrcr.com/vllm/vllm-kimi25-eagle:v0.19.0 .
```
Add README 2026-04-13 17:28:30 +00:00			`# vLLM Kimi-K2.5-Thinking Eagle3 Drafter`

			`A convenience Docker image that bundles the [Eagle3 drafter model](https://huggingface.co/nvidia/Kimi-K2.5-Thinking-Eagle3) into the vLLM container, so you can deploy speculative decoding without a separate model download step.`

			`## What's Inside`

			- Base image: `vllm/vllm-openai:v0.19.0`
			- Drafter model: `nvidia/Kimi-K2.5-Thinking-Eagle3` (Eagle3 speculator layers) extracted to `/opt/`

			> Note: This only works with `nvidia/Kimi-K2-Thinking-NVFP4` — the text generation model. It is not compatible with the multimodal Kimi 2.5.

			`## Pull`

			```bash
			`docker pull atl.vultrcr.com/vllm/vllm-kimi25-eagle:v0.19.0`
			```

			`## Usage`

			`Add the speculative decoding config to your vLLM launch args. Here's a known-working Kubernetes deployment snippet:`

			```yaml
			`- "--tensor-parallel-size=8"`
			`- "--trust-remote-code"`
			`- "--gpu-memory-utilization=0.92"`
			`- "--enable-auto-tool-choice"`
			`- "--tool-call-parser=kimi_k2"`
			`- "--reasoning-parser=kimi_k2"`
			`- "--speculative_config"`
			`- '{"model": "/opt/nvidia-Kimi-K2.5-Thinking-Eagle3/models--nvidia--Kimi-K2.5-Thinking-Eagle3/snapshots/13dab2a34d650a93196d37f2af91f74b8b855bab", "draft_tensor_parallel_size": 1, "num_speculative_tokens": 3, "method": "eagle3"}'`
			```

			`### Speculative Config Breakdown`

			`\| Parameter \| Value \| Notes \|`
			`\|---\|---\|---\|`
			\| `model` \| `/opt/nvidia-Kimi-K2.5-Thinking-Eagle3/...` \| Path to the drafter inside the container \|
			\| `draft_tensor_parallel_size` \| `1` \| TP size for the drafter \|
			\| `num_speculative_tokens` \| `3` \| Number of tokens to speculate per step \|
			\| `method` \| `eagle3` \| Speculative decoding method \|

			`## Building`

			`The Jenkins pipeline builds and pushes this image. Trigger a build with a specific tag:`

			```bash
			`curl -X POST "https://jenkins.sweetapi.com/job/vllm-kimi25-eagle/buildWithParameters" \`
			`-u "$JENKINS_USER:$JENKINS_PASS" \`
			`-d "TAG=v0.19.0"`
			```

			`To build locally:`

			```bash
			`docker build -t atl.vultrcr.com/vllm/vllm-kimi25-eagle:v0.19.0 .`
			```