[DOC] [ROCm] Update docker deployment doc (#33971)
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: TJian <tunjian.tan@embeddedllm.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
This commit is contained in:
@@ -174,67 +174,44 @@ uv pip install vllm --extra-index-url https://wheels.vllm.ai/rocm/0.15.0/rocm700
|
||||
# --8<-- [end:build-wheel-from-source]
|
||||
# --8<-- [start:pre-built-images]
|
||||
|
||||
#### Use vLLM's Official Docker Image
|
||||
|
||||
vLLM offers an official Docker image for deployment.
|
||||
The image can be used to run OpenAI compatible server and is available on Docker Hub as [vllm/vllm-openai-rocm](https://hub.docker.com/r/vllm/vllm-openai-rocm/tags).
|
||||
|
||||
???+ console "Commands"
|
||||
```bash
|
||||
docker run --rm \
|
||||
--group-add=video \
|
||||
--cap-add=SYS_PTRACE \
|
||||
--security-opt seccomp=unconfined \
|
||||
--device /dev/kfd \
|
||||
--device /dev/dri \
|
||||
-v ~/.cache/huggingface:/root/.cache/huggingface \
|
||||
--env "HF_TOKEN=$HF_TOKEN" \
|
||||
-p 8000:8000 \
|
||||
--ipc=host \
|
||||
vllm/vllm-openai-rocm:latest \
|
||||
--model Qwen/Qwen3-0.6B
|
||||
```
|
||||
|
||||
To use the docker image as base for development, you can launch it in interactive session through overriding the entrypoint.
|
||||
|
||||
???+ console "Commands"
|
||||
```bash
|
||||
docker run --rm -it \
|
||||
--group-add=video \
|
||||
--cap-add=SYS_PTRACE \
|
||||
--security-opt seccomp=unconfined \
|
||||
--device /dev/kfd \
|
||||
--device /dev/dri \
|
||||
-v ~/.cache/huggingface:/root/.cache/huggingface \
|
||||
--env "HF_TOKEN=$HF_TOKEN" \
|
||||
-p 8000:8000 \
|
||||
--ipc=host \
|
||||
--entrypoint bash \
|
||||
vllm/vllm-openai-rocm:latest
|
||||
```
|
||||
|
||||
|
||||
#### Use AMD's Docker Images
|
||||
|
||||
The [AMD Infinity hub for vLLM](https://hub.docker.com/r/rocm/vllm/tags) offers a prebuilt, optimized
|
||||
docker image designed for validating inference performance on the AMD Instinct™ MI300X accelerator.
|
||||
AMD also offers nightly prebuilt docker image from [Docker Hub](https://hub.docker.com/r/rocm/vllm-dev), which has vLLM and all its dependencies installed. The entrypoint of this docker image is `/bin/bash` (different from the vLLM's Official Docker Image).
|
||||
|
||||
???+ console "Commands"
|
||||
```bash
|
||||
docker pull rocm/vllm-dev:nightly # to get the latest image
|
||||
docker run -it --rm \
|
||||
--network=host \
|
||||
```bash
|
||||
docker run --rm \
|
||||
--group-add=video \
|
||||
--ipc=host \
|
||||
--cap-add=SYS_PTRACE \
|
||||
--security-opt seccomp=unconfined \
|
||||
--device /dev/kfd \
|
||||
--device /dev/dri \
|
||||
-v <path/to/your/models>:/app/models \
|
||||
-e HF_HOME="/app/models" \
|
||||
rocm/vllm-dev:nightly
|
||||
```
|
||||
-v ~/.cache/huggingface:/root/.cache/huggingface \
|
||||
--env "HF_TOKEN=$HF_TOKEN" \
|
||||
-p 8000:8000 \
|
||||
--ipc=host \
|
||||
vllm/vllm-openai-rocm:latest \
|
||||
--model Qwen/Qwen3-0.6B
|
||||
```
|
||||
|
||||
#### Use AMD's Docker Images
|
||||
|
||||
Prior to January 20th, 2026 when the official docker images are available on [upstream vLLM docker hub](https://hub.docker.com/v2/repositories/vllm/vllm-openai-rocm/tags/), the [AMD Infinity hub for vLLM](https://hub.docker.com/r/rocm/vllm/tags) offers a prebuilt, optimized
|
||||
docker image designed for validating inference performance on the AMD Instinct MI300X™ accelerator.
|
||||
AMD also offers nightly prebuilt docker image from [Docker Hub](https://hub.docker.com/r/rocm/vllm-dev), which has vLLM and all its dependencies installed. The entrypoint of this docker image is `/bin/bash` (different from the vLLM's Official Docker Image).
|
||||
|
||||
```bash
|
||||
docker pull rocm/vllm-dev:nightly # to get the latest image
|
||||
docker run -it --rm \
|
||||
--network=host \
|
||||
--group-add=video \
|
||||
--ipc=host \
|
||||
--cap-add=SYS_PTRACE \
|
||||
--security-opt seccomp=unconfined \
|
||||
--device /dev/kfd \
|
||||
--device /dev/dri \
|
||||
-v <path/to/your/models>:/app/models \
|
||||
-e HF_HOME="/app/models" \
|
||||
rocm/vllm-dev:nightly
|
||||
```
|
||||
|
||||
!!! tip
|
||||
Please check [LLM inference performance validation on AMD Instinct MI300X](https://rocm.docs.amd.com/en/latest/how-to/performance-validation/mi300x/vllm-benchmark.html)
|
||||
@@ -243,7 +220,7 @@ AMD also offers nightly prebuilt docker image from [Docker Hub](https://hub.dock
|
||||
# --8<-- [end:pre-built-images]
|
||||
# --8<-- [start:build-image-from-source]
|
||||
|
||||
Building the Docker image from source is the recommended way to use vLLM with ROCm.
|
||||
You can build and run vLLM from source via the provided [docker/Dockerfile.rocm](https://github.com/vllm-project/vllm/blob/main/docker/Dockerfile.rocm).
|
||||
|
||||
??? info "(Optional) Build an image with ROCm software stack"
|
||||
|
||||
@@ -269,8 +246,6 @@ Building the Docker image from source is the recommended way to use vLLM with RO
|
||||
-t rocm/vllm-dev:base .
|
||||
```
|
||||
|
||||
#### Build an image with vLLM
|
||||
|
||||
First, build a docker image from [docker/Dockerfile.rocm](https://github.com/vllm-project/vllm/blob/main/docker/Dockerfile.rocm) and launch a docker container from the image.
|
||||
It is important that the user kicks off the docker build using buildkit. Either the user put `DOCKER_BUILDKIT=1` as environment variable when calling docker build command, or the user needs to set up buildkit in the docker daemon configuration /etc/docker/daemon.json as follows and restart the daemon:
|
||||
|
||||
@@ -292,30 +267,46 @@ Their values can be passed in when running `docker build` with `--build-arg` opt
|
||||
|
||||
To build vllm on ROCm 7.0 for MI200 and MI300 series, you can use the default (which build a docker image with `vllm serve` as entrypoint):
|
||||
|
||||
???+ console "Commands"
|
||||
```bash
|
||||
DOCKER_BUILDKIT=1 docker build -f docker/Dockerfile.rocm -t vllm-rocm .
|
||||
```
|
||||
|
||||
To run the above docker image `vllm-rocm`, use the below command:
|
||||
```bash
|
||||
DOCKER_BUILDKIT=1 docker build -f docker/Dockerfile.rocm -t vllm/vllm-openai-rocm .
|
||||
```
|
||||
|
||||
|
||||
???+ console "Commands"
|
||||
```bash
|
||||
docker run -it \
|
||||
--network=host \
|
||||
To run vLLM with the custom-built Docker image:
|
||||
|
||||
```bash
|
||||
docker run --rm \
|
||||
--group-add=video \
|
||||
--ipc=host \
|
||||
--cap-add=SYS_PTRACE \
|
||||
--security-opt seccomp=unconfined \
|
||||
--device /dev/kfd \
|
||||
--device /dev/dri \
|
||||
-v <path/to/model>:/app/model \
|
||||
vllm-rocm \
|
||||
--model Qwen/Qwen3-0.6B
|
||||
```
|
||||
-v ~/.cache/huggingface:/root/.cache/huggingface \
|
||||
--env "HF_TOKEN=$HF_TOKEN" \
|
||||
-p 8000:8000 \
|
||||
--ipc=host \
|
||||
vllm/vllm-openai-rocm <args...>
|
||||
```
|
||||
|
||||
Where the `<path/to/model>` is the location where the model is stored, for example, the weights for llama2 or llama3 models.
|
||||
The argument `vllm/vllm-openai-rocm` specifies the image to run, and should be replaced with the name of the custom-built image (the `-t` tag from the build command).
|
||||
|
||||
To use the docker image as base for development, you can launch it in interactive session through overriding the entrypoint.
|
||||
|
||||
???+ console "Commands"
|
||||
```bash
|
||||
docker run --rm -it \
|
||||
--group-add=video \
|
||||
--cap-add=SYS_PTRACE \
|
||||
--security-opt seccomp=unconfined \
|
||||
--device /dev/kfd \
|
||||
--device /dev/dri \
|
||||
-v ~/.cache/huggingface:/root/.cache/huggingface \
|
||||
--env "HF_TOKEN=$HF_TOKEN" \
|
||||
--network=host \
|
||||
--ipc=host \
|
||||
--entrypoint bash \
|
||||
vllm/vllm-openai-rocm
|
||||
```
|
||||
|
||||
# --8<-- [end:build-image-from-source]
|
||||
# --8<-- [start:supported-features]
|
||||
|
||||
Reference in New Issue
Block a user