# Plan: Add Triton Kernels Support for vLLM v0.19.0 **Date:** 2026-04-06 **Status:** Ready for execution **Branch:** `feature/triton-kernels` (to be created) --- ## Problem vLLM v0.19.0 container builds successfully but fails at runtime with: ``` No module named 'triton_kernels.matmul_ogs' ``` The error occurs because: - `triton_kernels` is a library of pre-written Triton kernels (separate from the `triton` compiler) - vLLM v0.19.0 requires it for MoE (Mixture of Experts) operations - Our Dockerfile builds vLLM via `pip build --wheel`, which skips the cmake step that normally fetches `triton_kernels` --- ## Root Cause Analysis ### What vLLM's cmake build does (that we skip): 1. Fetches `triton_kernels` from Triton repo (tag v3.6.0) → copies to `vllm/third_party/triton_kernels/` 2. Builds `flashmla` 3. Builds `vllm_flash_attn` (we already handle this separately) ### What's in `triton_kernels`: Located at `python/triton_kernels/triton_kernels/` in the Triton repo: - `matmul_ogs.py` — MoE kernels (the missing module) - `topk.py` — Top-k routing - `swiglu.py` — SwiGLU activation - `tensor.py` — Tensor utilities - `distributed.py` — Distributed ops ### PyPI confusion: - `triton-kernels` on PyPI (v0.1.0) is from "Kernelize AI" — **NOT what vLLM needs** - The real `triton_kernels` is in the Triton repo itself --- ## Solution: Option A — Pip install from Git Add a build stage that installs `triton_kernels` directly from the Triton repo: ```dockerfile FROM build-base AS build-triton-kernels # Install triton_kernels from Triton repo (v3.6.0 matches vLLM's cmake default) RUN pip install git+https://github.com/triton-lang/triton.git@v3.6.0#subdirectory=python/triton_kernels # Copy the installed package to wheels for final stage RUN pip show triton_kernels | grep Location | cut -d' ' -f2 | xargs -I {} cp -r {} /wheels/ ``` Then in the final `vllm-openai` stage: ```dockerfile COPY --from=build-triton-kernels /wheels/triton_kernels /usr/local/lib/python3.12/dist-packages/triton_kernels ``` ### Why this works: - vLLM's `import_utils.py` checks `site-packages` for `triton_kernels` first - Installing it there means vLLM will find it - Minimal change, doesn't affect existing working components --- ## Execution Steps ### 1. Create new branch ```bash cd /home/openclaw/dev/grace-gpu-containers git checkout -b feature/triton-kernels ``` ### 2. Modify Dockerfile Edit `vllm/Dockerfile`: - Add `build-triton-kernels` stage after `build-triton` - Copy `triton_kernels` to final stage - Update header comment with new version info ### 3. Update CLAWMINE.md Document the new build configuration. ### 4. Commit and push ```bash git add -A git commit -m "Add triton_kernels for MoE support (vLLM v0.19.0)" git push origin feature/triton-kernels ``` ### 5. Create new Jenkins pipeline Create `gh200-vllm-tfa-build`: - Same as `gh200-vllm-build` but: - Pulls from `feature/triton-kernels` branch - Default `IMAGE_TAG=gh200-vllm-tfa` - Default `VLLM_VERSION=v0.19.0` ### 6. Trigger build Wait for Mike's OK before triggering. --- ## Tag Strategy | Image | Tag | Purpose | |-------|-----|---------| | `gh200-vllm` | `v0.19.0` | Working fallback (no triton_kernels) | | `gh200-vllm-tfa` | `v0.19.0-tfa` | New build with triton_kernels | If successful, `gh200-vllm-tfa:v0.19.0-tfa` becomes the production image. --- ## Rollback Plan If the build fails or runtime issues occur: 1. The existing `gh200-vllm:v0.19.0` image is untouched 2. Just revert to using that tag 3. No changes to main branch until verified working --- ## References - vLLM cmake config: `cmake/external_projects/triton_kernels.cmake` - vLLM import logic: `vllm/utils/import_utils.py` - Triton repo: https://github.com/triton-lang/triton - Triton v3.6.0 tag: `python/triton_kernels/triton_kernels/matmul_ogs.py`