Move dockerfiles into their own directory (#14549)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
This commit is contained in:
@@ -1,6 +1,6 @@
|
||||
# Dockerfile
|
||||
|
||||
We provide a <gh-file:Dockerfile> to construct the image for running an OpenAI compatible server with vLLM.
|
||||
We provide a <gh-file:docker/Dockerfile> to construct the image for running an OpenAI compatible server with vLLM.
|
||||
More information about deploying with Docker can be found [here](#deployment-docker).
|
||||
|
||||
Below is a visual representation of the multi-stage Dockerfile. The build graph contains the following nodes:
|
||||
@@ -28,7 +28,7 @@ The edges of the build graph represent:
|
||||
> Commands to regenerate the build graph (make sure to run it **from the \`root\` directory of the vLLM repository** where the dockerfile is present):
|
||||
>
|
||||
> ```bash
|
||||
> dockerfilegraph -o png --legend --dpi 200 --max-label-length 50 --filename Dockerfile
|
||||
> dockerfilegraph -o png --legend --dpi 200 --max-label-length 50 --filename docker/Dockerfile
|
||||
> ```
|
||||
>
|
||||
> or in case you want to run it directly with the docker image:
|
||||
@@ -43,7 +43,7 @@ The edges of the build graph represent:
|
||||
> --output png \
|
||||
> --dpi 200 \
|
||||
> --max-label-length 50 \
|
||||
> --filename Dockerfile \
|
||||
> --filename docker/Dockerfile \
|
||||
> --legend
|
||||
> ```
|
||||
>
|
||||
|
||||
@@ -45,7 +45,7 @@ pytest tests/
|
||||
```
|
||||
|
||||
:::{tip}
|
||||
Since the <gh-file:Dockerfile> ships with Python 3.12, all tests in CI (except `mypy`) are run with Python 3.12.
|
||||
Since the <gh-file:docker/Dockerfile> ships with Python 3.12, all tests in CI (except `mypy`) are run with Python 3.12.
|
||||
|
||||
Therefore, we recommend developing with Python 3.12 to minimise the chance of your local environment clashing with our CI environment.
|
||||
:::
|
||||
|
||||
@@ -61,11 +61,11 @@ RUN uv pip install --system git+https://github.com/huggingface/transformers.git
|
||||
|
||||
## Building vLLM's Docker Image from Source
|
||||
|
||||
You can build and run vLLM from source via the provided <gh-file:Dockerfile>. To build vLLM:
|
||||
You can build and run vLLM from source via the provided <gh-file:docker/Dockerfile>. To build vLLM:
|
||||
|
||||
```console
|
||||
# optionally specifies: --build-arg max_jobs=8 --build-arg nvcc_threads=2
|
||||
DOCKER_BUILDKIT=1 docker build . --target vllm-openai --tag vllm/vllm-openai
|
||||
DOCKER_BUILDKIT=1 docker build . --target vllm-openai --tag vllm/vllm-openai --file docker/Dockerfile
|
||||
```
|
||||
|
||||
:::{note}
|
||||
@@ -92,6 +92,7 @@ Keep an eye on memory usage with parallel jobs as it can be substantial (see exa
|
||||
# Example of building on Nvidia GH200 server. (Memory usage: ~15GB, Build time: ~1475s / ~25 min, Image size: 6.93GB)
|
||||
$ python3 use_existing_torch.py
|
||||
$ DOCKER_BUILDKIT=1 docker build . \
|
||||
--file docker/Dockerfile \
|
||||
--target vllm-openai \
|
||||
--platform "linux/arm64" \
|
||||
-t vllm/vllm-gh200-openai:latest \
|
||||
|
||||
@@ -69,14 +69,14 @@ server {
|
||||
|
||||
```console
|
||||
cd $vllm_root
|
||||
docker build -f Dockerfile . --tag vllm
|
||||
docker build -f docker/Dockerfile . --tag vllm
|
||||
```
|
||||
|
||||
If you are behind proxy, you can pass the proxy settings to the docker build command as shown below:
|
||||
|
||||
```console
|
||||
cd $vllm_root
|
||||
docker build -f Dockerfile . --tag vllm --build-arg http_proxy=$http_proxy --build-arg https_proxy=$https_proxy
|
||||
docker build -f docker/Dockerfile . --tag vllm --build-arg http_proxy=$http_proxy --build-arg https_proxy=$https_proxy
|
||||
```
|
||||
|
||||
(nginxloadbalancer-nginx-docker-network)=
|
||||
|
||||
@@ -86,7 +86,7 @@ Currently, there are no pre-built Intel Gaudi images.
|
||||
### Build image from source
|
||||
|
||||
```console
|
||||
docker build -f Dockerfile.hpu -t vllm-hpu-env .
|
||||
docker build -f docker/Dockerfile.hpu -t vllm-hpu-env .
|
||||
docker run -it --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --net=host --rm vllm-hpu-env
|
||||
```
|
||||
|
||||
|
||||
@@ -132,7 +132,7 @@ Currently, there are no pre-built Neuron images.
|
||||
|
||||
See <project:#deployment-docker-build-image-from-source> for instructions on building the Docker image.
|
||||
|
||||
Make sure to use <gh-file:Dockerfile.neuron> in place of the default Dockerfile.
|
||||
Make sure to use <gh-file:docker/Dockerfile.neuron> in place of the default Dockerfile.
|
||||
|
||||
## Extra information
|
||||
|
||||
|
||||
@@ -169,10 +169,10 @@ See <project:#deployment-docker-pre-built-image> for instructions on using the o
|
||||
|
||||
### Build image from source
|
||||
|
||||
You can use <gh-file:Dockerfile.tpu> to build a Docker image with TPU support.
|
||||
You can use <gh-file:docker/Dockerfile.tpu> to build a Docker image with TPU support.
|
||||
|
||||
```console
|
||||
docker build -f Dockerfile.tpu -t vllm-tpu .
|
||||
docker build -f docker/Dockerfile.tpu -t vllm-tpu .
|
||||
```
|
||||
|
||||
Run the Docker image with the following command:
|
||||
|
||||
@@ -177,7 +177,7 @@ Currently, there are no pre-built CPU wheels.
|
||||
### Build image from source
|
||||
|
||||
```console
|
||||
$ docker build -f Dockerfile.cpu --tag vllm-cpu-env --target vllm-openai .
|
||||
$ docker build -f docker/Dockerfile.cpu --tag vllm-cpu-env --target vllm-openai .
|
||||
|
||||
# Launching OpenAI server
|
||||
$ docker run --rm \
|
||||
@@ -193,11 +193,11 @@ $ docker run --rm \
|
||||
```
|
||||
|
||||
::::{tip}
|
||||
For ARM or Apple silicon, use `Dockerfile.arm`
|
||||
For ARM or Apple silicon, use `docker/Dockerfile.arm`
|
||||
::::
|
||||
|
||||
::::{tip}
|
||||
For IBM Z (s390x), use `Dockerfile.s390x` and in `docker run` use flag `--dtype float`
|
||||
For IBM Z (s390x), use `docker/Dockerfile.s390x` and in `docker run` use flag `--dtype float`
|
||||
::::
|
||||
|
||||
## Supported features
|
||||
|
||||
@@ -123,7 +123,7 @@ Building the Docker image from source is the recommended way to use vLLM with RO
|
||||
|
||||
#### (Optional) Build an image with ROCm software stack
|
||||
|
||||
Build a docker image from <gh-file:Dockerfile.rocm_base> which setup ROCm software stack needed by the vLLM.
|
||||
Build a docker image from <gh-file:docker/Dockerfile.rocm_base> which setup ROCm software stack needed by the vLLM.
|
||||
**This step is optional as this rocm_base image is usually prebuilt and store at [Docker Hub](https://hub.docker.com/r/rocm/vllm-dev) under tag `rocm/vllm-dev:base` to speed up user experience.**
|
||||
If you choose to build this rocm_base image yourself, the steps are as follows.
|
||||
|
||||
@@ -140,12 +140,12 @@ It is important that the user kicks off the docker build using buildkit. Either
|
||||
To build vllm on ROCm 6.3 for MI200 and MI300 series, you can use the default:
|
||||
|
||||
```console
|
||||
DOCKER_BUILDKIT=1 docker build -f Dockerfile.rocm_base -t rocm/vllm-dev:base .
|
||||
DOCKER_BUILDKIT=1 docker build -f docker/Dockerfile.rocm_base -t rocm/vllm-dev:base .
|
||||
```
|
||||
|
||||
#### Build an image with vLLM
|
||||
|
||||
First, build a docker image from <gh-file:Dockerfile.rocm> and launch a docker container from the image.
|
||||
First, build a docker image from <gh-file:docker/Dockerfile.rocm> and launch a docker container from the image.
|
||||
It is important that the user kicks off the docker build using buildkit. Either the user put `DOCKER_BUILDKIT=1` as environment variable when calling docker build command, or the user needs to setup buildkit in the docker daemon configuration /etc/docker/daemon.json as follows and restart the daemon:
|
||||
|
||||
```console
|
||||
@@ -156,10 +156,10 @@ It is important that the user kicks off the docker build using buildkit. Either
|
||||
}
|
||||
```
|
||||
|
||||
<gh-file:Dockerfile.rocm> uses ROCm 6.3 by default, but also supports ROCm 5.7, 6.0, 6.1, and 6.2, in older vLLM branches.
|
||||
<gh-file:docker/Dockerfile.rocm> uses ROCm 6.3 by default, but also supports ROCm 5.7, 6.0, 6.1, and 6.2, in older vLLM branches.
|
||||
It provides flexibility to customize the build of docker image using the following arguments:
|
||||
|
||||
- `BASE_IMAGE`: specifies the base image used when running `docker build`. The default value `rocm/vllm-dev:base` is an image published and maintained by AMD. It is being built using <gh-file:Dockerfile.rocm_base>
|
||||
- `BASE_IMAGE`: specifies the base image used when running `docker build`. The default value `rocm/vllm-dev:base` is an image published and maintained by AMD. It is being built using <gh-file:docker/Dockerfile.rocm_base>
|
||||
- `USE_CYTHON`: An option to run cython compilation on a subset of python files upon docker build
|
||||
- `BUILD_RPD`: Include RocmProfileData profiling tool in the image
|
||||
- `ARG_PYTORCH_ROCM_ARCH`: Allows to override the gfx architecture values from the base docker image
|
||||
@@ -169,13 +169,13 @@ Their values can be passed in when running `docker build` with `--build-arg` opt
|
||||
To build vllm on ROCm 6.3 for MI200 and MI300 series, you can use the default:
|
||||
|
||||
```console
|
||||
DOCKER_BUILDKIT=1 docker build -f Dockerfile.rocm -t vllm-rocm .
|
||||
DOCKER_BUILDKIT=1 docker build -f docker/Dockerfile.rocm -t vllm-rocm .
|
||||
```
|
||||
|
||||
To build vllm on ROCm 6.3 for Radeon RX7900 series (gfx1100), you should pick the alternative base image:
|
||||
|
||||
```console
|
||||
DOCKER_BUILDKIT=1 docker build --build-arg BASE_IMAGE="rocm/vllm-dev:navi_base" -f Dockerfile.rocm -t vllm-rocm .
|
||||
DOCKER_BUILDKIT=1 docker build --build-arg BASE_IMAGE="rocm/vllm-dev:navi_base" -f docker/Dockerfile.rocm -t vllm-rocm .
|
||||
```
|
||||
|
||||
To run the above docker image `vllm-rocm`, use the below command:
|
||||
|
||||
@@ -54,7 +54,7 @@ Currently, there are no pre-built XPU images.
|
||||
### Build image from source
|
||||
|
||||
```console
|
||||
$ docker build -f Dockerfile.xpu -t vllm-xpu-env --shm-size=4g .
|
||||
$ docker build -f docker/Dockerfile.xpu -t vllm-xpu-env --shm-size=4g .
|
||||
$ docker run -it \
|
||||
--rm \
|
||||
--network=host \
|
||||
|
||||
@@ -208,5 +208,5 @@ Currently, vLLM supports multiple backends for efficient Attention computation a
|
||||
If desired, you can also manually set the backend of your choice by configuring the environment variable `VLLM_ATTENTION_BACKEND` to one of the following options: `FLASH_ATTN`, `FLASHINFER` or `XFORMERS`.
|
||||
|
||||
```{attention}
|
||||
There are no pre-built vllm wheels containing Flash Infer, so you must install it in your environment first. Refer to the [Flash Infer official docs](https://docs.flashinfer.ai/) or see [Dockerfile](https://github.com/vllm-project/vllm/blob/main/Dockerfile) for instructions on how to install it.
|
||||
There are no pre-built vllm wheels containing Flash Infer, so you must install it in your environment first. Refer to the [Flash Infer official docs](https://docs.flashinfer.ai/) or see <gh-file:docker/Dockerfile> for instructions on how to install it.
|
||||
```
|
||||
|
||||
Reference in New Issue
Block a user