2025-05-23 11:09:53 +02:00
# --8<-- [start:installation]
2024-12-23 17:35:38 -05:00
2025-02-08 00:13:43 +08:00
vLLM supports AMD GPUs with ROCm 6.3.
2024-12-23 17:35:38 -05:00
2025-05-23 11:09:53 +02:00
!!! warning
There are no pre-built wheels for this device, so you must either use the pre-built Docker image or build vLLM from source.
2025-01-31 23:38:35 +00:00
2025-05-23 11:09:53 +02:00
# --8<-- [end:installation]
# --8<-- [start:requirements]
2024-12-23 17:35:38 -05:00
2025-03-28 23:39:18 -04:00
- GPU: MI200s (gfx90a), MI300 (gfx942), Radeon RX 7900 series (gfx1100/1101), Radeon RX 9000 series (gfx1200/1201)
2025-02-08 00:13:43 +08:00
- ROCm 6.3
2024-12-23 17:35:38 -05:00
2025-05-23 11:09:53 +02:00
# --8<-- [end:requirements]
# --8<-- [start:set-up-using-python]
2024-12-23 17:35:38 -05:00
2025-05-23 11:09:53 +02:00
# --8<-- [end:set-up-using-python]
# --8<-- [start:pre-built-wheels]
2024-12-23 17:35:38 -05:00
2025-01-13 12:27:36 +00:00
Currently, there are no pre-built ROCm wheels.
2024-12-23 17:35:38 -05:00
2025-05-23 11:09:53 +02:00
# --8<-- [end:pre-built-wheels]
# --8<-- [start:build-wheel-from-source]
2024-12-23 17:35:38 -05:00
0. Install prerequisites (skip if you are already in an environment/docker with the following installed):
2025-05-23 11:09:53 +02:00
- [ROCm ](https://rocm.docs.amd.com/en/latest/deploy/linux/index.html )
- [PyTorch ](https://pytorch.org/ )
2024-12-23 17:35:38 -05:00
2025-02-08 00:13:43 +08:00
For installing PyTorch, you can start from a fresh docker image, e.g, `rocm/pytorch:rocm6.3_ubuntu24.04_py3.12_pytorch_release_2.4.0` , `rocm/pytorch-nightly` . If you are using docker image, you can skip to Step 3.
2024-12-23 17:35:38 -05:00
2025-02-08 00:13:43 +08:00
Alternatively, you can install PyTorch using PyTorch wheels. You can check PyTorch installation guide in PyTorch [Getting Started ](https://pytorch.org/get-started/locally/ ). Example:
2025-06-23 18:59:09 +01:00
```bash
2025-02-08 00:13:43 +08:00
# Install PyTorch
2025-06-23 18:59:09 +01:00
pip uninstall torch -y
pip install --no-cache-dir --pre torch --index-url https://download.pytorch.org/whl/nightly/rocm6.3
2025-02-08 00:13:43 +08:00
```
2024-12-23 17:35:38 -05:00
1. Install [Triton flash attention for ROCm ](https://github.com/ROCm/triton )
2025-01-12 03:17:13 -05:00
Install ROCm's Triton flash attention (the default triton-mlir branch) following the instructions from [ROCm/triton ](https://github.com/ROCm/triton/blob/triton-mlir/README.md )
2024-12-23 17:35:38 -05:00
2025-06-23 18:59:09 +01:00
```bash
2025-01-12 03:17:13 -05:00
python3 -m pip install ninja cmake wheel pybind11
pip uninstall -y triton
git clone https://github.com/OpenAI/triton.git
cd triton
2025-02-08 00:13:43 +08:00
git checkout e5be006
2025-01-12 03:17:13 -05:00
cd python
pip3 install .
cd ../..
```
2024-12-23 17:35:38 -05:00
2025-05-23 11:09:53 +02:00
!!! note
If you see HTTP issue related to downloading packages during building triton, please try again as the HTTP error is intermittent.
2024-12-23 17:35:38 -05:00
2025-02-28 11:42:07 -05:00
2. Optionally, if you choose to use CK flash attention, you can install [flash attention for ROCm ](https://github.com/ROCm/flash-attention )
2024-12-23 17:35:38 -05:00
2025-02-28 11:42:07 -05:00
Install ROCm's flash attention (v2.7.2) following the instructions from [ROCm/flash-attention ](https://github.com/ROCm/flash-attention#amd-rocm-support )
2025-01-12 03:17:13 -05:00
Alternatively, wheels intended for vLLM use can be accessed under the releases.
2024-12-23 17:35:38 -05:00
2025-02-08 00:13:43 +08:00
For example, for ROCm 6.3, suppose your gfx arch is `gfx90a` . To get your gfx architecture, run `rocminfo |grep gfx` .
2024-12-23 17:35:38 -05:00
2025-06-23 18:59:09 +01:00
```bash
2025-01-12 03:17:13 -05:00
git clone https://github.com/ROCm/flash-attention.git
cd flash-attention
2025-02-08 00:13:43 +08:00
git checkout b7d29fb
2025-01-12 03:17:13 -05:00
git submodule update --init
GPU_ARCHS="gfx90a" python3 setup.py install
cd ..
```
2024-12-23 17:35:38 -05:00
2025-05-23 11:09:53 +02:00
!!! note
You might need to downgrade the "ninja" version to 1.10 as it is not used when compiling flash-attention-2 (e.g. `pip install ninja==1.10.2.4` )
2024-12-23 17:35:38 -05:00
2025-05-01 14:01:28 -04:00
3. If you choose to build AITER yourself to use a certain branch or commit, you can build AITER using the following steps:
2025-06-23 18:59:09 +01:00
```bash
2025-05-01 14:01:28 -04:00
python3 -m pip uninstall -y aiter
git clone --recursive https://github.com/ROCm/aiter.git
cd aiter
git checkout $AITER_BRANCH_OR_COMMIT
git submodule sync; git submodule update --init --recursive
python3 setup.py develop
```
2025-05-23 11:09:53 +02:00
!!! note
You will need to config the `$AITER_BRANCH_OR_COMMIT` for your purpose.
2025-05-01 14:01:28 -04:00
4. Build vLLM. For example, vLLM on ROCM 6.3 can be built with the following steps:
2024-12-23 17:35:38 -05:00
2025-06-23 13:24:23 +08:00
??? Commands
```bash
pip install --upgrade pip
# Build & install AMD SMI
pip install /opt/rocm/share/amd_smi
# Install dependencies
pip install --upgrade numba \
scipy \
huggingface-hub[cli,hf_transfer] \
setuptools_scm
pip install "numpy<2"
pip install -r requirements/rocm.txt
# Build vLLM for MI210/MI250/MI300.
export PYTORCH_ROCM_ARCH="gfx90a;gfx942"
python3 setup.py develop
```
2024-12-23 17:35:38 -05:00
2025-01-12 03:17:13 -05:00
This may take 5-10 minutes. Currently, `pip install .` does not work for ROCm installation.
2024-12-23 17:35:38 -05:00
2025-05-23 11:09:53 +02:00
!!! tip
- Triton flash attention is used by default. For benchmarking purposes, it is recommended to run a warm up step before collecting perf numbers.
- Triton flash attention does not currently support sliding window attention. If using half precision, please use CK flash-attention for sliding window support.
- To use CK flash-attention or PyTorch naive attention, please use this flag `export VLLM_USE_TRITON_FLASH_ATTN=0` to turn off triton flash attention.
- The ROCm version of PyTorch, ideally, should match the ROCm driver version.
2024-12-23 17:35:38 -05:00
2025-05-23 11:09:53 +02:00
!!! tip
- For MI300x (gfx942) users, to achieve optimal performance, please refer to [MI300x tuning guide ](https://rocm.docs.amd.com/en/latest/how-to/tuning-guides/mi300x/index.html ) for performance optimization and tuning tips on system and workflow level.
For vLLM, please refer to [vLLM performance optimization ](https://rocm.docs.amd.com/en/latest/how-to/tuning-guides/mi300x/workload.html#vllm-performance-optimization ).
2025-01-13 12:27:36 +00:00
2025-02-08 00:13:43 +08:00
## Set up using Docker (Recommended)
2025-01-13 12:27:36 +00:00
2025-05-23 11:09:53 +02:00
# --8<-- [end:set-up-using-docker]
# --8<-- [start:pre-built-images]
2025-01-13 12:27:36 +00:00
2025-01-31 23:38:35 +00:00
The [AMD Infinity hub for vLLM ](https://hub.docker.com/r/rocm/vllm/tags ) offers a prebuilt, optimized
docker image designed for validating inference performance on the AMD Instinct™ MI300X accelerator.
2025-05-23 11:09:53 +02:00
!!! tip
Please check [LLM inference performance validation on AMD Instinct MI300X ](https://rocm.docs.amd.com/en/latest/how-to/performance-validation/mi300x/vllm-benchmark.html )
for instructions on how to use this prebuilt docker image.
2025-01-13 12:27:36 +00:00
2025-05-23 11:09:53 +02:00
# --8<-- [end:pre-built-images]
# --8<-- [start:build-image-from-source]
2025-01-13 12:27:36 +00:00
Building the Docker image from source is the recommended way to use vLLM with ROCm.
2025-02-08 00:13:43 +08:00
#### (Optional) Build an image with ROCm software stack
2025-03-31 21:47:32 +01:00
Build a docker image from <gh-file:docker/Dockerfile.rocm_base> which setup ROCm software stack needed by the vLLM.
2025-02-08 00:13:43 +08:00
**This step is optional as this rocm_base image is usually prebuilt and store at [Docker Hub ](https://hub.docker.com/r/rocm/vllm-dev ) under tag `rocm/vllm-dev:base` to speed up user experience.**
If you choose to build this rocm_base image yourself, the steps are as follows.
2025-01-13 12:27:36 +00:00
It is important that the user kicks off the docker build using buildkit. Either the user put DOCKER_BUILDKIT=1 as environment variable when calling docker build command, or the user needs to setup buildkit in the docker daemon configuration /etc/docker/daemon.json as follows and restart the daemon:
2025-06-23 18:59:09 +01:00
```json
2025-01-13 12:27:36 +00:00
{
"features": {
"buildkit": true
}
}
```
2025-02-08 00:13:43 +08:00
To build vllm on ROCm 6.3 for MI200 and MI300 series, you can use the default:
2025-06-23 18:59:09 +01:00
```bash
2025-05-25 16:40:31 +08:00
DOCKER_BUILDKIT=1 docker build \
-f docker/Dockerfile.rocm_base \
-t rocm/vllm-dev:base .
2025-02-08 00:13:43 +08:00
```
#### Build an image with vLLM
2025-03-31 21:47:32 +01:00
First, build a docker image from <gh-file:docker/Dockerfile.rocm> and launch a docker container from the image.
2025-02-08 00:13:43 +08:00
It is important that the user kicks off the docker build using buildkit. Either the user put `DOCKER_BUILDKIT=1` as environment variable when calling docker build command, or the user needs to setup buildkit in the docker daemon configuration /etc/docker/daemon.json as follows and restart the daemon:
2025-06-23 18:59:09 +01:00
```bash
2025-02-08 00:13:43 +08:00
{
"features": {
"buildkit": true
}
}
```
2025-03-31 21:47:32 +01:00
<gh-file:docker/Dockerfile.rocm> uses ROCm 6.3 by default, but also supports ROCm 5.7, 6.0, 6.1, and 6.2, in older vLLM branches.
2025-01-13 12:27:36 +00:00
It provides flexibility to customize the build of docker image using the following arguments:
2025-03-31 21:47:32 +01:00
- `BASE_IMAGE` : specifies the base image used when running `docker build` . The default value `rocm/vllm-dev:base` is an image published and maintained by AMD. It is being built using <gh-file:docker/Dockerfile.rocm_base>
2025-01-20 23:22:23 -05:00
- `ARG_PYTORCH_ROCM_ARCH` : Allows to override the gfx architecture values from the base docker image
2025-01-13 12:27:36 +00:00
Their values can be passed in when running `docker build` with `--build-arg` options.
2025-02-08 00:13:43 +08:00
To build vllm on ROCm 6.3 for MI200 and MI300 series, you can use the default:
2025-01-13 12:27:36 +00:00
2025-06-23 18:59:09 +01:00
```bash
2025-03-31 21:47:32 +01:00
DOCKER_BUILDKIT=1 docker build -f docker/Dockerfile.rocm -t vllm-rocm .
2025-01-13 12:27:36 +00:00
```
2025-02-08 00:13:43 +08:00
To build vllm on ROCm 6.3 for Radeon RX7900 series (gfx1100), you should pick the alternative base image:
2025-01-13 12:27:36 +00:00
2025-06-23 18:59:09 +01:00
```bash
2025-05-25 16:40:31 +08:00
DOCKER_BUILDKIT=1 docker build \
--build-arg BASE_IMAGE="rocm/vllm-dev:navi_base" \
-f docker/Dockerfile.rocm \
-t vllm-rocm \
.
2025-01-13 12:27:36 +00:00
```
To run the above docker image `vllm-rocm` , use the below command:
2025-06-23 13:24:23 +08:00
??? Command
2025-06-23 18:59:09 +01:00
```bash
2025-06-23 13:24:23 +08:00
docker run -it \
--network=host \
--group-add=video \
--ipc=host \
--cap-add=SYS_PTRACE \
--security-opt seccomp=unconfined \
--device /dev/kfd \
--device /dev/dri \
-v <path/to/model>:/app/model \
vllm-rocm \
bash
```
2025-01-13 12:27:36 +00:00
Where the `<path/to/model>` is the location where the model is stored, for example, the weights for llama2 or llama3 models.
2025-06-14 00:25:08 +08:00
# --8<-- [end:build-image-from-source]
# --8<-- [start:supported-features]
2025-01-13 12:27:36 +00:00
2025-05-23 11:09:53 +02:00
See [feature-x-hardware][feature-x-hardware] compatibility matrix for feature support information.
2025-06-14 00:25:08 +08:00
# --8<-- [end:supported-features]
2025-05-23 11:09:53 +02:00
# --8<-- [end:extra-information]