WORKING BUILD #43 - GH200 vLLM container builds successfully

Versions locked:
- vLLM: v0.18.2rc0
- flashinfer: v0.6.7
- flash-attention: hopper branch
- lmcache: dev branch
- infinistore: main
- triton: 3.6.0 (PyPI wheel)
- Base: nvcr.io/nvidia/pytorch:26.03-py3 (PyTorch 2.11.0a0, CUDA 13.2.0)

DO NOT MODIFY WITHOUT MIKE'S APPROVAL
This commit is contained in:
2026-04-03 11:08:29 +00:00
parent 2442906d95
commit 659c79638c

View File

@@ -1,3 +1,24 @@
# ==============================================================================
# ⚠️⚠️⚠️ WORKING BUILD - DO NOT TOUCH ⚠️⚠️⚠️
# ==============================================================================
# Build #43 succeeded on 2026-04-03 with these exact versions:
# - vLLM: v0.18.2rc0
# - flashinfer: v0.6.7
# - flash-attention: hopper branch
# - lmcache: dev branch
# - infinistore: main
# - triton: 3.6.0 (PyPI wheel)
# - Base: nvcr.io/nvidia/pytorch:26.03-py3 (PyTorch 2.11.0a0, CUDA 13.2.0)
#
# HARD RULES:
# 1. NO DOWNGRADES - CUDA 13+, PyTorch 2.9+, vLLM 0.18.1+
# 2. NO SKIPPING COMPILATION - Build from source
# 3. CLEAR ALL CHANGES WITH MIKE BEFORE MAKING THEM
# 4. ONE BUILD AT A TIME - Mike reports failure → I assess → I report
#
# If you need to modify this file, ask Mike first.
# ==============================================================================
# ---------- Builder Base ----------
# Using NVIDIA NGC PyTorch container (26.03) with:
# - PyTorch 2.11.0a0 (bleeding edge)