diff --git a/README.md b/README.md new file mode 100644 index 0000000..d647749 --- /dev/null +++ b/README.md @@ -0,0 +1,56 @@ +# vLLM Kimi-K2.5-Thinking Eagle3 Drafter + +A convenience Docker image that bundles the [Eagle3 drafter model](https://huggingface.co/nvidia/Kimi-K2.5-Thinking-Eagle3) into the vLLM container, so you can deploy speculative decoding without a separate model download step. + +## What's Inside + +- **Base image:** `vllm/vllm-openai:v0.19.0` +- **Drafter model:** `nvidia/Kimi-K2.5-Thinking-Eagle3` (Eagle3 speculator layers) extracted to `/opt/` + +> **Note:** This only works with `nvidia/Kimi-K2-Thinking-NVFP4` — the text generation model. It is **not** compatible with the multimodal Kimi 2.5. + +## Pull + +```bash +docker pull atl.vultrcr.com/vllm/vllm-kimi25-eagle:v0.19.0 +``` + +## Usage + +Add the speculative decoding config to your vLLM launch args. Here's a known-working Kubernetes deployment snippet: + +```yaml +- "--tensor-parallel-size=8" +- "--trust-remote-code" +- "--gpu-memory-utilization=0.92" +- "--enable-auto-tool-choice" +- "--tool-call-parser=kimi_k2" +- "--reasoning-parser=kimi_k2" +- "--speculative_config" +- '{"model": "/opt/nvidia-Kimi-K2.5-Thinking-Eagle3/models--nvidia--Kimi-K2.5-Thinking-Eagle3/snapshots/13dab2a34d650a93196d37f2af91f74b8b855bab", "draft_tensor_parallel_size": 1, "num_speculative_tokens": 3, "method": "eagle3"}' +``` + +### Speculative Config Breakdown + +| Parameter | Value | Notes | +|---|---|---| +| `model` | `/opt/nvidia-Kimi-K2.5-Thinking-Eagle3/...` | Path to the drafter inside the container | +| `draft_tensor_parallel_size` | `1` | TP size for the drafter | +| `num_speculative_tokens` | `3` | Number of tokens to speculate per step | +| `method` | `eagle3` | Speculative decoding method | + +## Building + +The Jenkins pipeline builds and pushes this image. Trigger a build with a specific tag: + +```bash +curl -X POST "https://jenkins.sweetapi.com/job/vllm-kimi25-eagle/buildWithParameters" \ + -u "$JENKINS_USER:$JENKINS_PASS" \ + -d "TAG=v0.19.0" +``` + +To build locally: + +```bash +docker build -t atl.vultrcr.com/vllm/vllm-kimi25-eagle:v0.19.0 . +```