From edf12f7996eaaf681ca6229d0a7e0ad6f4351a41 Mon Sep 17 00:00:00 2001
From: biondizzle <biondizzle@gmail.com>
Date: Mon, 6 Apr 2026 17:25:06 +0000
Subject: [PATCH] Clean up: remove PLAN-triton-kernels.md (merged into main)

---
 PLAN-triton-kernels.md | 129 -----------------------------------------
 1 file changed, 129 deletions(-)
 delete mode 100644 PLAN-triton-kernels.md

diff --git a/PLAN-triton-kernels.md b/PLAN-triton-kernels.md
deleted file mode 100644
index f30fcb9..0000000
--- a/PLAN-triton-kernels.md
+++ /dev/null
@@ -1,129 +0,0 @@
-# Plan: Add Triton Kernels Support for vLLM v0.19.0
-
-**Date:** 2026-04-06
-**Status:** Ready for execution
-**Branch:** `feature/triton-kernels` (to be created)
-
----
-
-## Problem
-
-vLLM v0.19.0 container builds successfully but fails at runtime with:
-```
-No module named 'triton_kernels.matmul_ogs'
-```
-
-The error occurs because:
-- `triton_kernels` is a library of pre-written Triton kernels (separate from the `triton` compiler)
-- vLLM v0.19.0 requires it for MoE (Mixture of Experts) operations
-- Our Dockerfile builds vLLM via `pip build --wheel`, which skips the cmake step that normally fetches `triton_kernels`
-
----
-
-## Root Cause Analysis
-
-### What vLLM's cmake build does (that we skip):
-1. Fetches `triton_kernels` from Triton repo (tag v3.6.0) → copies to `vllm/third_party/triton_kernels/`
-2. Builds `flashmla`
-3. Builds `vllm_flash_attn` (we already handle this separately)
-
-### What's in `triton_kernels`:
-Located at `python/triton_kernels/triton_kernels/` in the Triton repo:
-- `matmul_ogs.py` — MoE kernels (the missing module)
-- `topk.py` — Top-k routing
-- `swiglu.py` — SwiGLU activation
-- `tensor.py` — Tensor utilities
-- `distributed.py` — Distributed ops
-
-### PyPI confusion:
-- `triton-kernels` on PyPI (v0.1.0) is from "Kernelize AI" — **NOT what vLLM needs**
-- The real `triton_kernels` is in the Triton repo itself
-
----
-
-## Solution: Option A — Pip install from Git
-
-Add a build stage that installs `triton_kernels` directly from the Triton repo:
-
-```dockerfile
-FROM build-base AS build-triton-kernels
-# Install triton_kernels from Triton repo (v3.6.0 matches vLLM's cmake default)
-RUN pip install git+https://github.com/triton-lang/triton.git@v3.6.0#subdirectory=python/triton_kernels
-# Copy the installed package to wheels for final stage
-RUN pip show triton_kernels | grep Location | cut -d' ' -f2 | xargs -I {} cp -r {} /wheels/
-```
-
-Then in the final `vllm-openai` stage:
-```dockerfile
-COPY --from=build-triton-kernels /wheels/triton_kernels /usr/local/lib/python3.12/dist-packages/triton_kernels
-```
-
-### Why this works:
-- vLLM's `import_utils.py` checks `site-packages` for `triton_kernels` first
-- Installing it there means vLLM will find it
-- Minimal change, doesn't affect existing working components
-
----
-
-## Execution Steps
-
-### 1. Create new branch
-```bash
-cd /home/openclaw/dev/grace-gpu-containers
-git checkout -b feature/triton-kernels
-```
-
-### 2. Modify Dockerfile
-Edit `vllm/Dockerfile`:
-- Add `build-triton-kernels` stage after `build-triton`
-- Copy `triton_kernels` to final stage
-- Update header comment with new version info
-
-### 3. Update CLAWMINE.md
-Document the new build configuration.
-
-### 4. Commit and push
-```bash
-git add -A
-git commit -m "Add triton_kernels for MoE support (vLLM v0.19.0)"
-git push origin feature/triton-kernels
-```
-
-### 5. Create new Jenkins pipeline
-Create `gh200-vllm-tfa-build`:
-- Same as `gh200-vllm-build` but:
-  - Pulls from `feature/triton-kernels` branch
-  - Default `IMAGE_TAG=gh200-vllm-tfa`
-  - Default `VLLM_VERSION=v0.19.0`
-
-### 6. Trigger build
-Wait for Mike's OK before triggering.
-
----
-
-## Tag Strategy
-
-| Image | Tag | Purpose |
-|-------|-----|---------|
-| `gh200-vllm` | `v0.19.0` | Working fallback (no triton_kernels) |
-| `gh200-vllm-tfa` | `v0.19.0-tfa` | New build with triton_kernels |
-
-If successful, `gh200-vllm-tfa:v0.19.0-tfa` becomes the production image.
-
----
-
-## Rollback Plan
-
-If the build fails or runtime issues occur:
-1. The existing `gh200-vllm:v0.19.0` image is untouched
-2. Just revert to using that tag
-3. No changes to main branch until verified working
-
----
-
-## References
-
-- vLLM cmake config: `cmake/external_projects/triton_kernels.cmake`
-- vLLM import logic: `vllm/utils/import_utils.py`
-- Triton repo: https://github.com/triton-lang/triton
-- Triton v3.6.0 tag: `python/triton_kernels/triton_kernels/matmul_ogs.py`