- Add build-triton-kernels stage to fetch triton_kernels from Triton v3.6.0 - Install to site-packages for vLLM to find at runtime - Resolves: No module named 'triton_kernels.matmul_ogs' - Image tag: gh200-vllm-tfa:v0.19.0-tfa
3.8 KiB
3.8 KiB
Plan: Add Triton Kernels Support for vLLM v0.19.0
Date: 2026-04-06
Status: Ready for execution
Branch: feature/triton-kernels (to be created)
Problem
vLLM v0.19.0 container builds successfully but fails at runtime with:
No module named 'triton_kernels.matmul_ogs'
The error occurs because:
triton_kernelsis a library of pre-written Triton kernels (separate from thetritoncompiler)- vLLM v0.19.0 requires it for MoE (Mixture of Experts) operations
- Our Dockerfile builds vLLM via
pip build --wheel, which skips the cmake step that normally fetchestriton_kernels
Root Cause Analysis
What vLLM's cmake build does (that we skip):
- Fetches
triton_kernelsfrom Triton repo (tag v3.6.0) → copies tovllm/third_party/triton_kernels/ - Builds
flashmla - Builds
vllm_flash_attn(we already handle this separately)
What's in triton_kernels:
Located at python/triton_kernels/triton_kernels/ in the Triton repo:
matmul_ogs.py— MoE kernels (the missing module)topk.py— Top-k routingswiglu.py— SwiGLU activationtensor.py— Tensor utilitiesdistributed.py— Distributed ops
PyPI confusion:
triton-kernelson PyPI (v0.1.0) is from "Kernelize AI" — NOT what vLLM needs- The real
triton_kernelsis in the Triton repo itself
Solution: Option A — Pip install from Git
Add a build stage that installs triton_kernels directly from the Triton repo:
FROM build-base AS build-triton-kernels
# Install triton_kernels from Triton repo (v3.6.0 matches vLLM's cmake default)
RUN pip install git+https://github.com/triton-lang/triton.git@v3.6.0#subdirectory=python/triton_kernels
# Copy the installed package to wheels for final stage
RUN pip show triton_kernels | grep Location | cut -d' ' -f2 | xargs -I {} cp -r {} /wheels/
Then in the final vllm-openai stage:
COPY --from=build-triton-kernels /wheels/triton_kernels /usr/local/lib/python3.12/dist-packages/triton_kernels
Why this works:
- vLLM's
import_utils.pycheckssite-packagesfortriton_kernelsfirst - Installing it there means vLLM will find it
- Minimal change, doesn't affect existing working components
Execution Steps
1. Create new branch
cd /home/openclaw/dev/grace-gpu-containers
git checkout -b feature/triton-kernels
2. Modify Dockerfile
Edit vllm/Dockerfile:
- Add
build-triton-kernelsstage afterbuild-triton - Copy
triton_kernelsto final stage - Update header comment with new version info
3. Update CLAWMINE.md
Document the new build configuration.
4. Commit and push
git add -A
git commit -m "Add triton_kernels for MoE support (vLLM v0.19.0)"
git push origin feature/triton-kernels
5. Create new Jenkins pipeline
Create gh200-vllm-tfa-build:
- Same as
gh200-vllm-buildbut:- Pulls from
feature/triton-kernelsbranch - Default
IMAGE_TAG=gh200-vllm-tfa - Default
VLLM_VERSION=v0.19.0
- Pulls from
6. Trigger build
Wait for Mike's OK before triggering.
Tag Strategy
| Image | Tag | Purpose |
|---|---|---|
gh200-vllm |
v0.19.0 |
Working fallback (no triton_kernels) |
gh200-vllm-tfa |
v0.19.0-tfa |
New build with triton_kernels |
If successful, gh200-vllm-tfa:v0.19.0-tfa becomes the production image.
Rollback Plan
If the build fails or runtime issues occur:
- The existing
gh200-vllm:v0.19.0image is untouched - Just revert to using that tag
- No changes to main branch until verified working
References
- vLLM cmake config:
cmake/external_projects/triton_kernels.cmake - vLLM import logic:
vllm/utils/import_utils.py - Triton repo: https://github.com/triton-lang/triton
- Triton v3.6.0 tag:
python/triton_kernels/triton_kernels/matmul_ogs.py