Clean up: remove PLAN-triton-kernels.md (merged into main)

2026-04-06 17:25:06 +00:00
parent e6cc28a942
commit edf12f7996
1 changed files with 0 additions and 129 deletions
--- a/PLAN-triton-kernels.md
+++ b/PLAN-triton-kernels.md
@@ -1,129 +0,0 @@
 # Plan: Add Triton Kernels Support for vLLM v0.19.0
 **Date:** 2026-04-06
 **Status:** Ready for execution
 **Branch:** `feature/triton-kernels` (to be created)
 ---
 ## Problem
 vLLM v0.19.0 container builds successfully but fails at runtime with:
 ```
 No module named 'triton_kernels.matmul_ogs'
 ```
 The error occurs because:
 - `triton_kernels` is a library of pre-written Triton kernels (separate from the `triton` compiler)
 - vLLM v0.19.0 requires it for MoE (Mixture of Experts) operations
 - Our Dockerfile builds vLLM via `pip build --wheel`, which skips the cmake step that normally fetches `triton_kernels`
 ---
 ## Root Cause Analysis
 ### What vLLM's cmake build does (that we skip):
 1. Fetches `triton_kernels` from Triton repo (tag v3.6.0) → copies to `vllm/third_party/triton_kernels/`
 2. Builds `flashmla`
 3. Builds `vllm_flash_attn` (we already handle this separately)
 ### What's in `triton_kernels`:
 Located at `python/triton_kernels/triton_kernels/` in the Triton repo:
 - `matmul_ogs.py` — MoE kernels (the missing module)
 - `topk.py` — Top-k routing
 - `swiglu.py` — SwiGLU activation
 - `tensor.py` — Tensor utilities
 - `distributed.py` — Distributed ops
 ### PyPI confusion:
 - `triton-kernels` on PyPI (v0.1.0) is from "Kernelize AI" — **NOT what vLLM needs**
 - The real `triton_kernels` is in the Triton repo itself
 ---
 ## Solution: Option A — Pip install from Git
 Add a build stage that installs `triton_kernels` directly from the Triton repo:
 ```dockerfile
 FROM build-base AS build-triton-kernels
 # Install triton_kernels from Triton repo (v3.6.0 matches vLLM's cmake default)
 RUN pip install git+https://github.com/triton-lang/triton.git@v3.6.0#subdirectory=python/triton_kernels
 # Copy the installed package to wheels for final stage
 RUN pip show triton_kernels | grep Location | cut -d' ' -f2 | xargs -I {} cp -r {} /wheels/
 ```
 Then in the final `vllm-openai` stage:
 ```dockerfile
 COPY --from=build-triton-kernels /wheels/triton_kernels /usr/local/lib/python3.12/dist-packages/triton_kernels
 ```
 ### Why this works:
 - vLLM's `import_utils.py` checks `site-packages` for `triton_kernels` first
 - Installing it there means vLLM will find it
 - Minimal change, doesn't affect existing working components
 ---
 ## Execution Steps
 ### 1. Create new branch
 ```bash
 cd /home/openclaw/dev/grace-gpu-containers
 git checkout -b feature/triton-kernels
 ```
 ### 2. Modify Dockerfile
 Edit `vllm/Dockerfile`:
 - Add `build-triton-kernels` stage after `build-triton`
 - Copy `triton_kernels` to final stage
 - Update header comment with new version info
 ### 3. Update CLAWMINE.md
 Document the new build configuration.
 ### 4. Commit and push
 ```bash
 git add -A
 git commit -m "Add triton_kernels for MoE support (vLLM v0.19.0)"
 git push origin feature/triton-kernels
 ```
 ### 5. Create new Jenkins pipeline
 Create `gh200-vllm-tfa-build`:
 - Same as `gh200-vllm-build` but:
  - Pulls from `feature/triton-kernels` branch
  - Default `IMAGE_TAG=gh200-vllm-tfa`
  - Default `VLLM_VERSION=v0.19.0`
 ### 6. Trigger build
 Wait for Mike's OK before triggering.
 ---
 ## Tag Strategy
 | Image | Tag | Purpose |
 |-------|-----|---------|
 | `gh200-vllm` | `v0.19.0` | Working fallback (no triton_kernels) |
 | `gh200-vllm-tfa` | `v0.19.0-tfa` | New build with triton_kernels |
 If successful, `gh200-vllm-tfa:v0.19.0-tfa` becomes the production image.
 ---
 ## Rollback Plan
 If the build fails or runtime issues occur:
 1. The existing `gh200-vllm:v0.19.0` image is untouched
 2. Just revert to using that tag
 3. No changes to main branch until verified working
 ---
 ## References
 - vLLM cmake config: `cmake/external_projects/triton_kernels.cmake`
 - vLLM import logic: `vllm/utils/import_utils.py`
 - Triton repo: https://github.com/triton-lang/triton
 - Triton v3.6.0 tag: `python/triton_kernels/triton_kernels/matmul_ogs.py`