Files
grace-gpu-containers/vllm
biondizzle 6255c94359 Downgrade to CUDA 12.8.1 for vLLM compatibility
cuMemcpyBatchAsync API changed in CUDA 13 - removed fail_idx parameter.
vLLM code targets CUDA 12.8 API. Downgrade to CUDA 12.8.1.
2026-04-03 07:43:19 +00:00
..
2025-10-23 18:11:41 +00:00

VLLM images for GH200

Hosted here

 docker login
# Alternative
# docker buildx build --platform linux/arm64 --memory=600g -t rajesh550/gh200-vllm:0.9.0.1 .
 docker build --memory=450g --platform linux/arm64 -t rajesh550/gh200-vllm:0.11.1rc2 . 2>&1 | tee build.log 
 docker push rajesh550/gh200-vllm:0.11.1rc2