Files

biondizzle a399fbc8c6 Add MAX_JOBS=2 for LMCache, restore vLLM build from source

- LMCache: reduced parallelism to avoid memory pressure
- vLLM: restored build from source (was using PyPI wheel)
- Will test with docker --memory=24g limit

2026-04-03 02:49:43 +00:00

.gitignore

Updated for CUDA 13

2025-10-21 19:21:13 +00:00

Dockerfile

Add MAX_JOBS=2 for LMCache, restore vLLM build from source

2026-04-03 02:49:43 +00:00

README.md

Updated to v0.11.1rc3

2025-10-23 18:11:41 +00:00

README.md

VLLM images for GH200

Hosted here

 docker login
# Alternative
# docker buildx build --platform linux/arm64 --memory=600g -t rajesh550/gh200-vllm:0.9.0.1 .
 docker build --memory=450g --platform linux/arm64 -t rajesh550/gh200-vllm:0.11.1rc2 . 2>&1 | tee build.log 
 docker push rajesh550/gh200-vllm:0.11.1rc2