# Plan: Add Triton Kernels Support for vLLM v0.19.0

**Date:** 2026-04-06
**Status:** Ready for execution
**Branch:** `feature/triton-kernels` (to be created)

---

## Problem

vLLM v0.19.0 container builds successfully but fails at runtime with:
```
No module named 'triton_kernels.matmul_ogs'
```

The error occurs because:
- `triton_kernels` is a library of pre-written Triton kernels (separate from the `triton` compiler)
- vLLM v0.19.0 requires it for MoE (Mixture of Experts) operations
- Our Dockerfile builds vLLM via `pip build --wheel`, which skips the cmake step that normally fetches `triton_kernels`

---

## Root Cause Analysis

### What vLLM's cmake build does (that we skip):
1. Fetches `triton_kernels` from Triton repo (tag v3.6.0) → copies to `vllm/third_party/triton_kernels/`
2. Builds `flashmla`
3. Builds `vllm_flash_attn` (we already handle this separately)

### What's in `triton_kernels`:
Located at `python/triton_kernels/triton_kernels/` in the Triton repo:
- `matmul_ogs.py` — MoE kernels (the missing module)
- `topk.py` — Top-k routing
- `swiglu.py` — SwiGLU activation
- `tensor.py` — Tensor utilities
- `distributed.py` — Distributed ops

### PyPI confusion:
- `triton-kernels` on PyPI (v0.1.0) is from "Kernelize AI" — **NOT what vLLM needs**
- The real `triton_kernels` is in the Triton repo itself

---

## Solution: Option A — Pip install from Git

Add a build stage that installs `triton_kernels` directly from the Triton repo:

```dockerfile
FROM build-base AS build-triton-kernels
# Install triton_kernels from Triton repo (v3.6.0 matches vLLM's cmake default)
RUN pip install git+https://github.com/triton-lang/triton.git@v3.6.0#subdirectory=python/triton_kernels
# Copy the installed package to wheels for final stage
RUN pip show triton_kernels | grep Location | cut -d' ' -f2 | xargs -I {} cp -r {} /wheels/
```

Then in the final `vllm-openai` stage:
```dockerfile
COPY --from=build-triton-kernels /wheels/triton_kernels /usr/local/lib/python3.12/dist-packages/triton_kernels
```

### Why this works:
- vLLM's `import_utils.py` checks `site-packages` for `triton_kernels` first
- Installing it there means vLLM will find it
- Minimal change, doesn't affect existing working components

---

## Execution Steps

### 1. Create new branch
```bash
cd /home/openclaw/dev/grace-gpu-containers
git checkout -b feature/triton-kernels
```

### 2. Modify Dockerfile
Edit `vllm/Dockerfile`:
- Add `build-triton-kernels` stage after `build-triton`
- Copy `triton_kernels` to final stage
- Update header comment with new version info

### 3. Update CLAWMINE.md
Document the new build configuration.

### 4. Commit and push
```bash
git add -A
git commit -m "Add triton_kernels for MoE support (vLLM v0.19.0)"
git push origin feature/triton-kernels
```

### 5. Create new Jenkins pipeline
Create `gh200-vllm-tfa-build`:
- Same as `gh200-vllm-build` but:
  - Pulls from `feature/triton-kernels` branch
  - Default `IMAGE_TAG=gh200-vllm-tfa`
  - Default `VLLM_VERSION=v0.19.0`

### 6. Trigger build
Wait for Mike's OK before triggering.

---

## Tag Strategy

| Image | Tag | Purpose |
|-------|-----|---------|
| `gh200-vllm` | `v0.19.0` | Working fallback (no triton_kernels) |
| `gh200-vllm-tfa` | `v0.19.0-tfa` | New build with triton_kernels |

If successful, `gh200-vllm-tfa:v0.19.0-tfa` becomes the production image.

---

## Rollback Plan

If the build fails or runtime issues occur:
1. The existing `gh200-vllm:v0.19.0` image is untouched
2. Just revert to using that tag
3. No changes to main branch until verified working

---

## References

- vLLM cmake config: `cmake/external_projects/triton_kernels.cmake`
- vLLM import logic: `vllm/utils/import_utils.py`
- Triton repo: https://github.com/triton-lang/triton
- Triton v3.6.0 tag: `python/triton_kernels/triton_kernels/matmul_ogs.py`