README.md

# Building containers for GH200

Currently, prebuilt wheels for `vLLM` and `LMcache` are not available for `aarch64`. This can make setup tedious when working on modern `aarch64` platforms such as NVIDIA GH200.

Further, Nvidia at this time does not provide the `Dockerfile` associated with the NGC containers which makes replacing some of the components (like a newer version of vLLM) tedious.

This repository provides a Dockerfile to build a container with vLLM and all its dependencies pre-installed to try out various things such as KV offloading.

If you prefer not to build the image yourself, you can pull the ready-to-use image directly from Docker Hub:

```bash
docker run --rm -it --gpus all -v "$PWD":"$PWD" -w "$PWD" rajesh550/gh200-vllm:0.11.0 bash

# CUDA 13
docker run --rm -it --gpus all -v "$PWD":"$PWD" -w "$PWD" rajesh550/gh200-vllm:0.11.1rc2 bash
```

👉 [Docker Hub](https://hub.docker.com/repository/docker/rajesh550/gh200-vllm/general)

Version info:

```bash
CUDA: 13.0.1
Ubuntu: 24.04
Python: 3.12
PyTorch: 2.9.0+cu130
Triton: 3.5.x
xformers: 0.32.post2+
flashinfer: 0.4.1
flashattention: 3.0.0b1
LMCache: 0.3.7
vLLM: 0.11.1rc3
```
Update README.md 2025-09-24 01:43:49 -05:00			`# Building containers for GH200`

			Currently, prebuilt wheels for `vLLM` and `LMcache` are not available for `aarch64`. This can make setup tedious when working on modern `aarch64` platforms such as NVIDIA GH200.

Updated for CUDA 13 2025-10-21 19:21:13 +00:00			Further, Nvidia at this time does not provide the `Dockerfile` associated with the NGC containers which makes replacing some of the components (like a newer version of vLLM) tedious.

Update README.md 2025-09-24 01:43:49 -05:00			`This repository provides a Dockerfile to build a container with vLLM and all its dependencies pre-installed to try out various things such as KV offloading.`

			`If you prefer not to build the image yourself, you can pull the ready-to-use image directly from Docker Hub:`

Updated for v0.11.0 2025-10-16 01:08:21 +00:00			```bash
			`docker run --rm -it --gpus all -v "$PWD":"$PWD" -w "$PWD" rajesh550/gh200-vllm:0.11.0 bash`
Updated for CUDA 13 2025-10-21 19:21:13 +00:00
			`# CUDA 13`
Updated to vLLM v0.11.1rc3 2025-10-23 18:16:57 +00:00			`docker run --rm -it --gpus all -v "$PWD":"$PWD" -w "$PWD" rajesh550/gh200-vllm:0.11.1rc2 bash`
Updated for v0.11.0 2025-10-16 01:08:21 +00:00			```
Update README.md 2025-09-24 01:43:49 -05:00
			`👉 [Docker Hub](https://hub.docker.com/repository/docker/rajesh550/gh200-vllm/general)`

Updated for CUDA 13 2025-10-21 19:21:13 +00:00			`Version info:`

			```bash
			`CUDA: 13.0.1`
			`Ubuntu: 24.04`
			`Python: 3.12`
			`PyTorch: 2.9.0+cu130`
			`Triton: 3.5.x`
			`xformers: 0.32.post2+`
Updated to vLLM v0.11.1rc3 2025-10-23 18:16:57 +00:00			`flashinfer: 0.4.1`
			`flashattention: 3.0.0b1`
Updated for CUDA 13 2025-10-21 19:21:13 +00:00			`LMCache: 0.3.7`
Updated to vLLM v0.11.1rc3 2025-10-23 18:16:57 +00:00			`vLLM: 0.11.1rc3`
Updated for CUDA 13 2025-10-21 19:21:13 +00:00			```