[ROCm] [Feature] [Doc] [Dockerfile] [BugFix] Support Per-Token-Activation Per-Channel-Weight FP8 Quantization Inferencing (#12501)
This commit is contained in:
@@ -1,6 +1,6 @@
|
||||
# Installation
|
||||
|
||||
vLLM supports AMD GPUs with ROCm 6.2.
|
||||
vLLM supports AMD GPUs with ROCm 6.3.
|
||||
|
||||
:::{attention}
|
||||
There are no pre-built wheels for this device, so you must either use the pre-built Docker image or build vLLM from source.
|
||||
@@ -9,7 +9,7 @@ There are no pre-built wheels for this device, so you must either use the pre-bu
|
||||
## Requirements
|
||||
|
||||
- GPU: MI200s (gfx90a), MI300 (gfx942), Radeon RX 7900 series (gfx1100)
|
||||
- ROCm 6.2
|
||||
- ROCm 6.3
|
||||
|
||||
## Set up using Python
|
||||
|
||||
@@ -24,9 +24,15 @@ Currently, there are no pre-built ROCm wheels.
|
||||
- [ROCm](https://rocm.docs.amd.com/en/latest/deploy/linux/index.html)
|
||||
- [PyTorch](https://pytorch.org/)
|
||||
|
||||
For installing PyTorch, you can start from a fresh docker image, e.g, `rocm/pytorch:rocm6.2_ubuntu20.04_py3.9_pytorch_release_2.3.0`, `rocm/pytorch-nightly`.
|
||||
For installing PyTorch, you can start from a fresh docker image, e.g, `rocm/pytorch:rocm6.3_ubuntu24.04_py3.12_pytorch_release_2.4.0`, `rocm/pytorch-nightly`. If you are using docker image, you can skip to Step 3.
|
||||
|
||||
Alternatively, you can install PyTorch using PyTorch wheels. You can check PyTorch installation guide in PyTorch [Getting Started](https://pytorch.org/get-started/locally/)
|
||||
Alternatively, you can install PyTorch using PyTorch wheels. You can check PyTorch installation guide in PyTorch [Getting Started](https://pytorch.org/get-started/locally/). Example:
|
||||
|
||||
```console
|
||||
# Install PyTorch
|
||||
$ pip uninstall torch -y
|
||||
$ pip install --no-cache-dir --pre torch --index-url https://download.pytorch.org/whl/rocm6.3
|
||||
```
|
||||
|
||||
1. Install [Triton flash attention for ROCm](https://github.com/ROCm/triton)
|
||||
|
||||
@@ -37,7 +43,7 @@ Currently, there are no pre-built ROCm wheels.
|
||||
pip uninstall -y triton
|
||||
git clone https://github.com/OpenAI/triton.git
|
||||
cd triton
|
||||
git checkout e192dba
|
||||
git checkout e5be006
|
||||
cd python
|
||||
pip3 install .
|
||||
cd ../..
|
||||
@@ -49,15 +55,15 @@ Currently, there are no pre-built ROCm wheels.
|
||||
|
||||
2. Optionally, if you choose to use CK flash attention, you can install [flash attention for ROCm](https://github.com/ROCm/flash-attention/tree/ck_tile)
|
||||
|
||||
Install ROCm's flash attention (v2.5.9.post1) following the instructions from [ROCm/flash-attention](https://github.com/ROCm/flash-attention/tree/ck_tile#amd-gpurocm-support)
|
||||
Install ROCm's flash attention (v2.7.2) following the instructions from [ROCm/flash-attention](https://github.com/ROCm/flash-attention/tree/ck_tile#amd-gpurocm-support)
|
||||
Alternatively, wheels intended for vLLM use can be accessed under the releases.
|
||||
|
||||
For example, for ROCm 6.2, suppose your gfx arch is `gfx90a`. To get your gfx architecture, run `rocminfo |grep gfx`.
|
||||
For example, for ROCm 6.3, suppose your gfx arch is `gfx90a`. To get your gfx architecture, run `rocminfo |grep gfx`.
|
||||
|
||||
```console
|
||||
git clone https://github.com/ROCm/flash-attention.git
|
||||
cd flash-attention
|
||||
git checkout 3cea2fb
|
||||
git checkout b7d29fb
|
||||
git submodule update --init
|
||||
GPU_ARCHS="gfx90a" python3 setup.py install
|
||||
cd ..
|
||||
@@ -67,20 +73,16 @@ Currently, there are no pre-built ROCm wheels.
|
||||
You might need to downgrade the "ninja" version to 1.10 it is not used when compiling flash-attention-2 (e.g. `pip install ninja==1.10.2.4`)
|
||||
:::
|
||||
|
||||
3. Build vLLM. For example, vLLM on ROCM 6.2 can be built with the following steps:
|
||||
3. Build vLLM. For example, vLLM on ROCM 6.3 can be built with the following steps:
|
||||
|
||||
```bash
|
||||
$ pip install --upgrade pip
|
||||
|
||||
# Install PyTorch
|
||||
$ pip uninstall torch -y
|
||||
$ pip install --no-cache-dir --pre torch --index-url https://download.pytorch.org/whl/rocm6.2
|
||||
|
||||
# Build & install AMD SMI
|
||||
$ pip install /opt/rocm/share/amd_smi
|
||||
|
||||
# Install dependencies
|
||||
$ pip install --upgrade numba scipy huggingface-hub[cli]
|
||||
$ pip install --upgrade numba scipy huggingface-hub[cli,hf_transfer] setuptools_scm
|
||||
$ pip install "numpy<2"
|
||||
$ pip install -r requirements-rocm.txt
|
||||
|
||||
@@ -104,7 +106,7 @@ Currently, there are no pre-built ROCm wheels.
|
||||
For vLLM, please refer to [vLLM performance optimization](https://rocm.docs.amd.com/en/latest/how-to/tuning-guides/mi300x/workload.html#vllm-performance-optimization).
|
||||
:::
|
||||
|
||||
## Set up using Docker
|
||||
## Set up using Docker (Recommended)
|
||||
|
||||
### Pre-built images
|
||||
|
||||
@@ -120,7 +122,12 @@ for instructions on how to use this prebuilt docker image.
|
||||
|
||||
Building the Docker image from source is the recommended way to use vLLM with ROCm.
|
||||
|
||||
First, build a docker image from <gh-file:Dockerfile.rocm> and launch a docker container from the image.
|
||||
#### (Optional) Build an image with ROCm software stack
|
||||
|
||||
Build a docker image from <gh-file:Dockerfile.rocm_base> which setup ROCm software stack needed by the vLLM.
|
||||
**This step is optional as this rocm_base image is usually prebuilt and store at [Docker Hub](https://hub.docker.com/r/rocm/vllm-dev) under tag `rocm/vllm-dev:base` to speed up user experience.**
|
||||
If you choose to build this rocm_base image yourself, the steps are as follows.
|
||||
|
||||
It is important that the user kicks off the docker build using buildkit. Either the user put DOCKER_BUILDKIT=1 as environment variable when calling docker build command, or the user needs to setup buildkit in the docker daemon configuration /etc/docker/daemon.json as follows and restart the daemon:
|
||||
|
||||
```console
|
||||
@@ -131,7 +138,26 @@ It is important that the user kicks off the docker build using buildkit. Either
|
||||
}
|
||||
```
|
||||
|
||||
<gh-file:Dockerfile.rocm> uses ROCm 6.2 by default, but also supports ROCm 5.7, 6.0 and 6.1 in older vLLM branches.
|
||||
To build vllm on ROCm 6.3 for MI200 and MI300 series, you can use the default:
|
||||
|
||||
```console
|
||||
DOCKER_BUILDKIT=1 docker build -f Dockerfile.rocm_base -t rocm/vllm-dev:base .
|
||||
```
|
||||
|
||||
#### Build an image with vLLM
|
||||
|
||||
First, build a docker image from <gh-file:Dockerfile.rocm> and launch a docker container from the image.
|
||||
It is important that the user kicks off the docker build using buildkit. Either the user put `DOCKER_BUILDKIT=1` as environment variable when calling docker build command, or the user needs to setup buildkit in the docker daemon configuration /etc/docker/daemon.json as follows and restart the daemon:
|
||||
|
||||
```console
|
||||
{
|
||||
"features": {
|
||||
"buildkit": true
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
<gh-file:Dockerfile.rocm> uses ROCm 6.3 by default, but also supports ROCm 5.7, 6.0, 6.1, and 6.2, in older vLLM branches.
|
||||
It provides flexibility to customize the build of docker image using the following arguments:
|
||||
|
||||
- `BASE_IMAGE`: specifies the base image used when running `docker build`. The default value `rocm/vllm-dev:base` is an image published and maintained by AMD. It is being built using <gh-file:Dockerfile.rocm_base>
|
||||
@@ -141,13 +167,13 @@ It provides flexibility to customize the build of docker image using the followi
|
||||
|
||||
Their values can be passed in when running `docker build` with `--build-arg` options.
|
||||
|
||||
To build vllm on ROCm 6.2 for MI200 and MI300 series, you can use the default:
|
||||
To build vllm on ROCm 6.3 for MI200 and MI300 series, you can use the default:
|
||||
|
||||
```console
|
||||
DOCKER_BUILDKIT=1 docker build -f Dockerfile.rocm -t vllm-rocm .
|
||||
```
|
||||
|
||||
To build vllm on ROCm 6.2 for Radeon RX7900 series (gfx1100), you should pick the alternative base image:
|
||||
To build vllm on ROCm 6.3 for Radeon RX7900 series (gfx1100), you should pick the alternative base image:
|
||||
|
||||
```console
|
||||
DOCKER_BUILDKIT=1 docker build --build-arg BASE_IMAGE="rocm/vllm-dev:navi_base" -f Dockerfile.rocm -t vllm-rocm .
|
||||
|
||||
Reference in New Issue
Block a user