From edf12f7996eaaf681ca6229d0a7e0ad6f4351a41 Mon Sep 17 00:00:00 2001 From: biondizzle Date: Mon, 6 Apr 2026 17:25:06 +0000 Subject: [PATCH] Clean up: remove PLAN-triton-kernels.md (merged into main) --- PLAN-triton-kernels.md | 129 ----------------------------------------- 1 file changed, 129 deletions(-) delete mode 100644 PLAN-triton-kernels.md diff --git a/PLAN-triton-kernels.md b/PLAN-triton-kernels.md deleted file mode 100644 index f30fcb9..0000000 --- a/PLAN-triton-kernels.md +++ /dev/null @@ -1,129 +0,0 @@ -# Plan: Add Triton Kernels Support for vLLM v0.19.0 - -**Date:** 2026-04-06 -**Status:** Ready for execution -**Branch:** `feature/triton-kernels` (to be created) - ---- - -## Problem - -vLLM v0.19.0 container builds successfully but fails at runtime with: -``` -No module named 'triton_kernels.matmul_ogs' -``` - -The error occurs because: -- `triton_kernels` is a library of pre-written Triton kernels (separate from the `triton` compiler) -- vLLM v0.19.0 requires it for MoE (Mixture of Experts) operations -- Our Dockerfile builds vLLM via `pip build --wheel`, which skips the cmake step that normally fetches `triton_kernels` - ---- - -## Root Cause Analysis - -### What vLLM's cmake build does (that we skip): -1. Fetches `triton_kernels` from Triton repo (tag v3.6.0) → copies to `vllm/third_party/triton_kernels/` -2. Builds `flashmla` -3. Builds `vllm_flash_attn` (we already handle this separately) - -### What's in `triton_kernels`: -Located at `python/triton_kernels/triton_kernels/` in the Triton repo: -- `matmul_ogs.py` — MoE kernels (the missing module) -- `topk.py` — Top-k routing -- `swiglu.py` — SwiGLU activation -- `tensor.py` — Tensor utilities -- `distributed.py` — Distributed ops - -### PyPI confusion: -- `triton-kernels` on PyPI (v0.1.0) is from "Kernelize AI" — **NOT what vLLM needs** -- The real `triton_kernels` is in the Triton repo itself - ---- - -## Solution: Option A — Pip install from Git - -Add a build stage that installs `triton_kernels` directly from the Triton repo: - -```dockerfile -FROM build-base AS build-triton-kernels -# Install triton_kernels from Triton repo (v3.6.0 matches vLLM's cmake default) -RUN pip install git+https://github.com/triton-lang/triton.git@v3.6.0#subdirectory=python/triton_kernels -# Copy the installed package to wheels for final stage -RUN pip show triton_kernels | grep Location | cut -d' ' -f2 | xargs -I {} cp -r {} /wheels/ -``` - -Then in the final `vllm-openai` stage: -```dockerfile -COPY --from=build-triton-kernels /wheels/triton_kernels /usr/local/lib/python3.12/dist-packages/triton_kernels -``` - -### Why this works: -- vLLM's `import_utils.py` checks `site-packages` for `triton_kernels` first -- Installing it there means vLLM will find it -- Minimal change, doesn't affect existing working components - ---- - -## Execution Steps - -### 1. Create new branch -```bash -cd /home/openclaw/dev/grace-gpu-containers -git checkout -b feature/triton-kernels -``` - -### 2. Modify Dockerfile -Edit `vllm/Dockerfile`: -- Add `build-triton-kernels` stage after `build-triton` -- Copy `triton_kernels` to final stage -- Update header comment with new version info - -### 3. Update CLAWMINE.md -Document the new build configuration. - -### 4. Commit and push -```bash -git add -A -git commit -m "Add triton_kernels for MoE support (vLLM v0.19.0)" -git push origin feature/triton-kernels -``` - -### 5. Create new Jenkins pipeline -Create `gh200-vllm-tfa-build`: -- Same as `gh200-vllm-build` but: - - Pulls from `feature/triton-kernels` branch - - Default `IMAGE_TAG=gh200-vllm-tfa` - - Default `VLLM_VERSION=v0.19.0` - -### 6. Trigger build -Wait for Mike's OK before triggering. - ---- - -## Tag Strategy - -| Image | Tag | Purpose | -|-------|-----|---------| -| `gh200-vllm` | `v0.19.0` | Working fallback (no triton_kernels) | -| `gh200-vllm-tfa` | `v0.19.0-tfa` | New build with triton_kernels | - -If successful, `gh200-vllm-tfa:v0.19.0-tfa` becomes the production image. - ---- - -## Rollback Plan - -If the build fails or runtime issues occur: -1. The existing `gh200-vllm:v0.19.0` image is untouched -2. Just revert to using that tag -3. No changes to main branch until verified working - ---- - -## References - -- vLLM cmake config: `cmake/external_projects/triton_kernels.cmake` -- vLLM import logic: `vllm/utils/import_utils.py` -- Triton repo: https://github.com/triton-lang/triton -- Triton v3.6.0 tag: `python/triton_kernels/triton_kernels/matmul_ogs.py`