Files
grace-gpu-containers/PLAN-triton-kernels.md
biondizzle e6cc28a942 Add triton_kernels for MoE support (vLLM v0.19.0)
- Add build-triton-kernels stage to fetch triton_kernels from Triton v3.6.0
- Install to site-packages for vLLM to find at runtime
- Resolves: No module named 'triton_kernels.matmul_ogs'
- Image tag: gh200-vllm-tfa:v0.19.0-tfa
2026-04-06 16:39:56 +00:00

3.8 KiB

Plan: Add Triton Kernels Support for vLLM v0.19.0

Date: 2026-04-06 Status: Ready for execution Branch: feature/triton-kernels (to be created)


Problem

vLLM v0.19.0 container builds successfully but fails at runtime with:

No module named 'triton_kernels.matmul_ogs'

The error occurs because:

  • triton_kernels is a library of pre-written Triton kernels (separate from the triton compiler)
  • vLLM v0.19.0 requires it for MoE (Mixture of Experts) operations
  • Our Dockerfile builds vLLM via pip build --wheel, which skips the cmake step that normally fetches triton_kernels

Root Cause Analysis

What vLLM's cmake build does (that we skip):

  1. Fetches triton_kernels from Triton repo (tag v3.6.0) → copies to vllm/third_party/triton_kernels/
  2. Builds flashmla
  3. Builds vllm_flash_attn (we already handle this separately)

What's in triton_kernels:

Located at python/triton_kernels/triton_kernels/ in the Triton repo:

  • matmul_ogs.py — MoE kernels (the missing module)
  • topk.py — Top-k routing
  • swiglu.py — SwiGLU activation
  • tensor.py — Tensor utilities
  • distributed.py — Distributed ops

PyPI confusion:

  • triton-kernels on PyPI (v0.1.0) is from "Kernelize AI" — NOT what vLLM needs
  • The real triton_kernels is in the Triton repo itself

Solution: Option A — Pip install from Git

Add a build stage that installs triton_kernels directly from the Triton repo:

FROM build-base AS build-triton-kernels
# Install triton_kernels from Triton repo (v3.6.0 matches vLLM's cmake default)
RUN pip install git+https://github.com/triton-lang/triton.git@v3.6.0#subdirectory=python/triton_kernels
# Copy the installed package to wheels for final stage
RUN pip show triton_kernels | grep Location | cut -d' ' -f2 | xargs -I {} cp -r {} /wheels/

Then in the final vllm-openai stage:

COPY --from=build-triton-kernels /wheels/triton_kernels /usr/local/lib/python3.12/dist-packages/triton_kernels

Why this works:

  • vLLM's import_utils.py checks site-packages for triton_kernels first
  • Installing it there means vLLM will find it
  • Minimal change, doesn't affect existing working components

Execution Steps

1. Create new branch

cd /home/openclaw/dev/grace-gpu-containers
git checkout -b feature/triton-kernels

2. Modify Dockerfile

Edit vllm/Dockerfile:

  • Add build-triton-kernels stage after build-triton
  • Copy triton_kernels to final stage
  • Update header comment with new version info

3. Update CLAWMINE.md

Document the new build configuration.

4. Commit and push

git add -A
git commit -m "Add triton_kernels for MoE support (vLLM v0.19.0)"
git push origin feature/triton-kernels

5. Create new Jenkins pipeline

Create gh200-vllm-tfa-build:

  • Same as gh200-vllm-build but:
    • Pulls from feature/triton-kernels branch
    • Default IMAGE_TAG=gh200-vllm-tfa
    • Default VLLM_VERSION=v0.19.0

6. Trigger build

Wait for Mike's OK before triggering.


Tag Strategy

Image Tag Purpose
gh200-vllm v0.19.0 Working fallback (no triton_kernels)
gh200-vllm-tfa v0.19.0-tfa New build with triton_kernels

If successful, gh200-vllm-tfa:v0.19.0-tfa becomes the production image.


Rollback Plan

If the build fails or runtime issues occur:

  1. The existing gh200-vllm:v0.19.0 image is untouched
  2. Just revert to using that tag
  3. No changes to main branch until verified working

References

  • vLLM cmake config: cmake/external_projects/triton_kernels.cmake
  • vLLM import logic: vllm/utils/import_utils.py
  • Triton repo: https://github.com/triton-lang/triton
  • Triton v3.6.0 tag: python/triton_kernels/triton_kernels/matmul_ogs.py