Files

biondizzle e6cc28a942 Add triton_kernels for MoE support (vLLM v0.19.0)

- Add build-triton-kernels stage to fetch triton_kernels from Triton v3.6.0
- Install to site-packages for vLLM to find at runtime
- Resolves: No module named 'triton_kernels.matmul_ogs'
- Image tag: gh200-vllm-tfa:v0.19.0-tfa

2026-04-06 16:39:56 +00:00

3.8 KiB

Raw Blame History

Plan: Add Triton Kernels Support for vLLM v0.19.0

Date: 2026-04-06 Status: Ready for execution Branch: feature/triton-kernels (to be created)

Problem

vLLM v0.19.0 container builds successfully but fails at runtime with:

No module named 'triton_kernels.matmul_ogs'

The error occurs because:

triton_kernels is a library of pre-written Triton kernels (separate from the triton compiler)
vLLM v0.19.0 requires it for MoE (Mixture of Experts) operations
Our Dockerfile builds vLLM via pip build --wheel, which skips the cmake step that normally fetches triton_kernels

Root Cause Analysis

What vLLM's cmake build does (that we skip):

Fetches triton_kernels from Triton repo (tag v3.6.0) → copies to vllm/third_party/triton_kernels/
Builds flashmla
Builds vllm_flash_attn (we already handle this separately)

What's in `triton_kernels`:

Located at python/triton_kernels/triton_kernels/ in the Triton repo:

matmul_ogs.py — MoE kernels (the missing module)
topk.py — Top-k routing
swiglu.py — SwiGLU activation
tensor.py — Tensor utilities
distributed.py — Distributed ops

PyPI confusion:

triton-kernels on PyPI (v0.1.0) is from "Kernelize AI" — NOT what vLLM needs
The real triton_kernels is in the Triton repo itself

Solution: Option A — Pip install from Git

Add a build stage that installs triton_kernels directly from the Triton repo:

FROM build-base AS build-triton-kernels
# Install triton_kernels from Triton repo (v3.6.0 matches vLLM's cmake default)
RUN pip install git+https://github.com/triton-lang/triton.git@v3.6.0#subdirectory=python/triton_kernels
# Copy the installed package to wheels for final stage
RUN pip show triton_kernels | grep Location | cut -d' ' -f2 | xargs -I {} cp -r {} /wheels/

Then in the final vllm-openai stage:

COPY --from=build-triton-kernels /wheels/triton_kernels /usr/local/lib/python3.12/dist-packages/triton_kernels

Why this works:

vLLM's import_utils.py checks site-packages for triton_kernels first
Installing it there means vLLM will find it
Minimal change, doesn't affect existing working components

Execution Steps

1. Create new branch

cd /home/openclaw/dev/grace-gpu-containers
git checkout -b feature/triton-kernels

2. Modify Dockerfile

Edit vllm/Dockerfile:

Add build-triton-kernels stage after build-triton
Copy triton_kernels to final stage
Update header comment with new version info

3. Update CLAWMINE.md

Document the new build configuration.

4. Commit and push

git add -A
git commit -m "Add triton_kernels for MoE support (vLLM v0.19.0)"
git push origin feature/triton-kernels

5. Create new Jenkins pipeline

Create gh200-vllm-tfa-build:

Same as gh200-vllm-build but:
- Pulls from feature/triton-kernels branch
- Default IMAGE_TAG=gh200-vllm-tfa
- Default VLLM_VERSION=v0.19.0

6. Trigger build

Wait for Mike's OK before triggering.

Tag Strategy

Image	Tag	Purpose
`gh200-vllm`	`v0.19.0`	Working fallback (no triton_kernels)
`gh200-vllm-tfa`	`v0.19.0-tfa`	New build with triton_kernels

If successful, gh200-vllm-tfa:v0.19.0-tfa becomes the production image.

Rollback Plan

If the build fails or runtime issues occur:

The existing gh200-vllm:v0.19.0 image is untouched
Just revert to using that tag
No changes to main branch until verified working

References

vLLM cmake config: cmake/external_projects/triton_kernels.cmake
vLLM import logic: vllm/utils/import_utils.py
Triton repo: https://github.com/triton-lang/triton
Triton v3.6.0 tag: python/triton_kernels/triton_kernels/matmul_ogs.py

3.8 KiB Raw Blame History