diff --git a/CLAWMINE.md b/CLAWMINE.md deleted file mode 100644 index c69c6dd..0000000 --- a/CLAWMINE.md +++ /dev/null @@ -1,195 +0,0 @@ -# GH200 vLLM Container Build Pipeline - -> Managed by Clawmine — `/home/openclaw/dev/grace-gpu-containers` - -## Overview - -Building vLLM containers for NVIDIA GH200 (Grace Hopper, ARM64 + H100 GPU). The challenge: prebuilt wheels for `aarch64` are limited, and NVIDIA doesn't publish NGC Dockerfiles. - -## Jenkins Pipeline - -**Server:** https://jenkins.sweetapi.com/ -**Job:** `gh200-vllm-build` -**Status:** Configured, never run (ready for first build) - -### Jenkins Server (Build Machine) - -``` -Host: 66.135.24.21 -User: root -Pass: Wy9,za7+8BL(v@ZT -``` - -**Setup:** -- Docker buildx `multiarch` builder configured -- QEMU user-static installed for ARM64 emulation -- 780GB free disk - -### Build Parameters - -| Parameter | Default | Description | -|-----------|---------|-------------| -| `VLLM_VERSION` | `v0.18.1` | vLLM git tag | -| `CUDA_VERSION` | `13.0.1` | CUDA version | -| `IMAGE_TAG` | `gh200-vllm` | Docker image name | -| `PUSH_TO_REGISTRY` | `true` | Push to Vultr CR after build | - -### Container Registry - -**URL:** `sjc.vultrcr.com/charizard` -**User:** `891294d0-df76-4c37-b41d-2b77f95a54c1` -**Pass:** `H3aE2NfqRLs5Aio6SCnnDKBJwnB6rsJfFZ7E` - -**Images:** -- `sjc.vultrcr.com/charizard/gh200-vllm:v0.18.1` -- `sjc.vultrcr.com/charizard/gh200-vllm:latest` - -### Build History - -| Build # | Version | Status | Duration | Notes | -|---------|---------|--------|----------|-------| -| 18 | v0.18.1 | RUNNING | - | Started 2026-04-03 04:11 UTC — native GH200 builder | -| 17 | main | FAILED | ~4 min | vLLM main branch CUDA 13 API mismatch | -| 2 | v0.18.1 | FAILED | - | Started 2026-04-02 ~20:03 UTC — setuptools pinned | -| 1 | v0.18.1 | FAILED | ~15 min | setuptools 82.0.1 incompatible with LMCache | - -### Monitoring - -**Cron Job:** Every 30 minutes, checks build status and notifies on completion. -- Job ID: `e06d540f-899e-4358-92df-85fe036b05e2` -- Script: `~/.openclaw/scripts/check-jenkins-vllm-build.sh` - -**Manual check:** -```bash -curl -s -u "admin:1112cc255997ecb7c34d0089c4edddd976" \ - "https://jenkins.sweetapi.com/job/gh200-vllm-build/lastBuild/api/json" | jq '{result, building, duration}' -``` - -### Triggering a Build - -```bash -# Via API -curl -X POST "https://jenkins.sweetapi.com/job/gh200-vllm-build/buildWithParameters" \ - -u "admin:1112cc255997ecb7c34d0089c4edddd976" \ - -d "VLLM_VERSION=v0.18.1" - -# Check status -curl -s -u "admin:1112cc255997ecb7c34d0089c4edddd976" \ - "https://jenkins.sweetapi.com/job/gh200-vllm-build/lastBuild/api/json" | jq '.result' -``` - -## Current State - -### Dockerfile (`vllm/Dockerfile`) - -Builds everything from source: -- **triton** (release/3.5.x) -- **xformers** (johnnynunez fork) -- **flashinfer** (v0.4.1) -- **flash-attention** (hopper branch) -- **lmcache** (v0.3.7) -- **infinistore** (main) -- **vLLM** (configurable via `VLLM_REF`) - -**Target Architecture:** `9.0a` (NVIDIA Hopper) - -### Latest Versions (as of 2026-04-02) - -| Package | Stable | Latest | -|---------|--------|--------| -| vLLM | v0.18.1 | v0.19.0rc1 | -| Triton | 3.6.0 | 3.6.0 | -| CUDA | 13.0.1 | 13.0.1 | - -## PyPI Wheel Status for aarch64 - -| Package | aarch64 Wheel | Notes | -|---------|---------------|-------| -| vLLM 0.18.1 | ✅ Yes | Includes FA2, FA3, MoE kernels | -| Triton 3.6.0 | ✅ Yes | Official wheel | -| flashinfer | ❌ No | Must build from source | -| xformers | ❌ No | Must build from source | - -**Key Finding:** The official vLLM aarch64 wheel includes pre-compiled CUDA kernels: -- `vllm/_C.abi3.so` (381MB) -- `vllm/vllm_flash_attn/_vllm_fa2_C.abi3.so` (263MB) -- `vllm/vllm_flash_attn/_vllm_fa3_C.abi3.so` (142MB) -- `vllm/_moe_C.abi3.so` (202MB) - -This means **basic vLLM on GH200 can use PyPI wheels directly** without compilation. - -## Build Options - -### Option 1: QEMU Cross-Compilation (Current Setup) - -**Pros:** -- Works on existing x86 Jenkins server -- No GH200 required - -**Cons:** -- Very slow (hours) -- CUDA kernels may not be fully optimized -- flashinfer/xformers need source builds - -**When to use:** When no GH200 available - -### Option 2: Native GH200 Build - -**Pros:** -- Much faster compilation -- Properly optimized ARM64 + Hopper kernels -- All dependencies build correctly - -**Cons:** -- Need GH200 access - -**When to use:** For production-quality builds - -### Option 3: Hybrid (PyPI wheels + source builds) - -**Approach:** -1. Use vLLM + Triton from PyPI (pre-compiled aarch64) -2. Build only flashinfer/xformers from source (if needed) - -**Pros:** -- Fastest option -- Official wheels for core components - -**Cons:** -- May miss some optimizations -- flashinfer/xformers still need source builds - -## Recommended Path - -1. **Immediate:** Try PyPI wheels on a GH200 — if they work, no build needed -2. **If PyPI insufficient:** Run Jenkins build with `VLLM_VERSION=v0.18.1` -3. **Production:** Get a GH200 for native builds - -## Source Repo - -- **Our fork:** https://sweetapi.com/biondizzle/grace-gpu-containers.git (Jenkins pulls from here) -- **Upstream:** https://github.com/rajesh-s/grace-gpu-containers - -### Local repo: `/home/openclaw/dev/grace-gpu-containers` - -```bash -git remote -v -origin ssh://git@sweetapi.com:2222/biondizzle/grace-gpu-containers.git (push) -upstream https://github.com/rajesh-s/grace-gpu-containers.git (fetch) -``` - -### Changes from upstream - -1. **setuptools pin** — Pin setuptools to `<81.0.0` for LMCache compatibility - - Changed: `RUN uv pip install -U build cmake ninja pybind11 setuptools wheel` - - To: `RUN uv pip install -U build cmake ninja pybind11 "setuptools>=77.0.3,<81.0.0" wheel` - -2. **pip for flash-attention** — Use `pip wheel` instead of `pip3 wheel` (venv has pip, not pip3) - - Changed: `pip3 wheel . -v --no-deps` - - To: `pip wheel . -v --no-deps` - -3. **CLAWMINE.md** — This documentation file - ---- - -*Last updated: 2026-04-03 by Clawmine*