[Docs] Reduce custom syntax used in docs (#27009)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
This commit is contained in:
@@ -4,19 +4,19 @@ vLLM is a Python library that supports the following CPU variants. Select your C
|
||||
|
||||
=== "Intel/AMD x86"
|
||||
|
||||
--8<-- "docs/getting_started/installation/cpu/x86.inc.md:installation"
|
||||
--8<-- "docs/getting_started/installation/cpu.x86.inc.md:installation"
|
||||
|
||||
=== "ARM AArch64"
|
||||
|
||||
--8<-- "docs/getting_started/installation/cpu/arm.inc.md:installation"
|
||||
--8<-- "docs/getting_started/installation/cpu.arm.inc.md:installation"
|
||||
|
||||
=== "Apple silicon"
|
||||
|
||||
--8<-- "docs/getting_started/installation/cpu/apple.inc.md:installation"
|
||||
--8<-- "docs/getting_started/installation/cpu.apple.inc.md:installation"
|
||||
|
||||
=== "IBM Z (S390X)"
|
||||
|
||||
--8<-- "docs/getting_started/installation/cpu/s390x.inc.md:installation"
|
||||
--8<-- "docs/getting_started/installation/cpu.s390x.inc.md:installation"
|
||||
|
||||
## Requirements
|
||||
|
||||
@@ -24,19 +24,19 @@ vLLM is a Python library that supports the following CPU variants. Select your C
|
||||
|
||||
=== "Intel/AMD x86"
|
||||
|
||||
--8<-- "docs/getting_started/installation/cpu/x86.inc.md:requirements"
|
||||
--8<-- "docs/getting_started/installation/cpu.x86.inc.md:requirements"
|
||||
|
||||
=== "ARM AArch64"
|
||||
|
||||
--8<-- "docs/getting_started/installation/cpu/arm.inc.md:requirements"
|
||||
--8<-- "docs/getting_started/installation/cpu.arm.inc.md:requirements"
|
||||
|
||||
=== "Apple silicon"
|
||||
|
||||
--8<-- "docs/getting_started/installation/cpu/apple.inc.md:requirements"
|
||||
--8<-- "docs/getting_started/installation/cpu.apple.inc.md:requirements"
|
||||
|
||||
=== "IBM Z (S390X)"
|
||||
|
||||
--8<-- "docs/getting_started/installation/cpu/s390x.inc.md:requirements"
|
||||
--8<-- "docs/getting_started/installation/cpu.s390x.inc.md:requirements"
|
||||
|
||||
## Set up using Python
|
||||
|
||||
@@ -52,19 +52,19 @@ Currently, there are no pre-built CPU wheels.
|
||||
|
||||
=== "Intel/AMD x86"
|
||||
|
||||
--8<-- "docs/getting_started/installation/cpu/x86.inc.md:build-wheel-from-source"
|
||||
--8<-- "docs/getting_started/installation/cpu.x86.inc.md:build-wheel-from-source"
|
||||
|
||||
=== "ARM AArch64"
|
||||
|
||||
--8<-- "docs/getting_started/installation/cpu/arm.inc.md:build-wheel-from-source"
|
||||
--8<-- "docs/getting_started/installation/cpu.arm.inc.md:build-wheel-from-source"
|
||||
|
||||
=== "Apple silicon"
|
||||
|
||||
--8<-- "docs/getting_started/installation/cpu/apple.inc.md:build-wheel-from-source"
|
||||
--8<-- "docs/getting_started/installation/cpu.apple.inc.md:build-wheel-from-source"
|
||||
|
||||
=== "IBM Z (s390x)"
|
||||
|
||||
--8<-- "docs/getting_started/installation/cpu/s390x.inc.md:build-wheel-from-source"
|
||||
--8<-- "docs/getting_started/installation/cpu.s390x.inc.md:build-wheel-from-source"
|
||||
|
||||
## Set up using Docker
|
||||
|
||||
@@ -72,24 +72,24 @@ Currently, there are no pre-built CPU wheels.
|
||||
|
||||
=== "Intel/AMD x86"
|
||||
|
||||
--8<-- "docs/getting_started/installation/cpu/x86.inc.md:pre-built-images"
|
||||
--8<-- "docs/getting_started/installation/cpu.x86.inc.md:pre-built-images"
|
||||
|
||||
### Build image from source
|
||||
|
||||
=== "Intel/AMD x86"
|
||||
|
||||
--8<-- "docs/getting_started/installation/cpu/x86.inc.md:build-image-from-source"
|
||||
--8<-- "docs/getting_started/installation/cpu.x86.inc.md:build-image-from-source"
|
||||
|
||||
=== "ARM AArch64"
|
||||
|
||||
--8<-- "docs/getting_started/installation/cpu/arm.inc.md:build-image-from-source"
|
||||
--8<-- "docs/getting_started/installation/cpu.arm.inc.md:build-image-from-source"
|
||||
|
||||
=== "Apple silicon"
|
||||
|
||||
--8<-- "docs/getting_started/installation/cpu/arm.inc.md:build-image-from-source"
|
||||
--8<-- "docs/getting_started/installation/cpu.arm.inc.md:build-image-from-source"
|
||||
|
||||
=== "IBM Z (S390X)"
|
||||
--8<-- "docs/getting_started/installation/cpu/s390x.inc.md:build-image-from-source"
|
||||
--8<-- "docs/getting_started/installation/cpu.s390x.inc.md:build-image-from-source"
|
||||
|
||||
## Related runtime environment variables
|
||||
|
||||
|
||||
@@ -157,7 +157,7 @@ See [deployment-docker-pre-built-image][deployment-docker-pre-built-image] for i
|
||||
|
||||
### Build image from source
|
||||
|
||||
You can use <gh-file:docker/Dockerfile.tpu> to build a Docker image with TPU support.
|
||||
You can use [docker/Dockerfile.tpu](../../../docker/Dockerfile.tpu) to build a Docker image with TPU support.
|
||||
|
||||
```bash
|
||||
docker build -f docker/Dockerfile.tpu -t vllm-tpu .
|
||||
|
||||
@@ -11,7 +11,7 @@ vLLM contains pre-compiled C++ and CUDA (12.8) binaries.
|
||||
# --8<-- [start:set-up-using-python]
|
||||
|
||||
!!! note
|
||||
PyTorch installed via `conda` will statically link `NCCL` library, which can cause issues when vLLM tries to use `NCCL`. See <gh-issue:8420> for more details.
|
||||
PyTorch installed via `conda` will statically link `NCCL` library, which can cause issues when vLLM tries to use `NCCL`. See <https://github.com/vllm-project/vllm/issues/8420> for more details.
|
||||
|
||||
In order to be performant, vLLM has to compile many cuda kernels. The compilation unfortunately introduces binary incompatibility with other CUDA versions and PyTorch versions, even for the same PyTorch version with different building configurations.
|
||||
|
||||
@@ -4,15 +4,15 @@ vLLM is a Python library that supports the following GPU variants. Select your G
|
||||
|
||||
=== "NVIDIA CUDA"
|
||||
|
||||
--8<-- "docs/getting_started/installation/gpu/cuda.inc.md:installation"
|
||||
--8<-- "docs/getting_started/installation/gpu.cuda.inc.md:installation"
|
||||
|
||||
=== "AMD ROCm"
|
||||
|
||||
--8<-- "docs/getting_started/installation/gpu/rocm.inc.md:installation"
|
||||
--8<-- "docs/getting_started/installation/gpu.rocm.inc.md:installation"
|
||||
|
||||
=== "Intel XPU"
|
||||
|
||||
--8<-- "docs/getting_started/installation/gpu/xpu.inc.md:installation"
|
||||
--8<-- "docs/getting_started/installation/gpu.xpu.inc.md:installation"
|
||||
|
||||
## Requirements
|
||||
|
||||
@@ -24,15 +24,15 @@ vLLM is a Python library that supports the following GPU variants. Select your G
|
||||
|
||||
=== "NVIDIA CUDA"
|
||||
|
||||
--8<-- "docs/getting_started/installation/gpu/cuda.inc.md:requirements"
|
||||
--8<-- "docs/getting_started/installation/gpu.cuda.inc.md:requirements"
|
||||
|
||||
=== "AMD ROCm"
|
||||
|
||||
--8<-- "docs/getting_started/installation/gpu/rocm.inc.md:requirements"
|
||||
--8<-- "docs/getting_started/installation/gpu.rocm.inc.md:requirements"
|
||||
|
||||
=== "Intel XPU"
|
||||
|
||||
--8<-- "docs/getting_started/installation/gpu/xpu.inc.md:requirements"
|
||||
--8<-- "docs/getting_started/installation/gpu.xpu.inc.md:requirements"
|
||||
|
||||
## Set up using Python
|
||||
|
||||
@@ -42,29 +42,29 @@ vLLM is a Python library that supports the following GPU variants. Select your G
|
||||
|
||||
=== "NVIDIA CUDA"
|
||||
|
||||
--8<-- "docs/getting_started/installation/gpu/cuda.inc.md:set-up-using-python"
|
||||
--8<-- "docs/getting_started/installation/gpu.cuda.inc.md:set-up-using-python"
|
||||
|
||||
=== "AMD ROCm"
|
||||
|
||||
--8<-- "docs/getting_started/installation/gpu/rocm.inc.md:set-up-using-python"
|
||||
--8<-- "docs/getting_started/installation/gpu.rocm.inc.md:set-up-using-python"
|
||||
|
||||
=== "Intel XPU"
|
||||
|
||||
--8<-- "docs/getting_started/installation/gpu/xpu.inc.md:set-up-using-python"
|
||||
--8<-- "docs/getting_started/installation/gpu.xpu.inc.md:set-up-using-python"
|
||||
|
||||
### Pre-built wheels
|
||||
|
||||
=== "NVIDIA CUDA"
|
||||
|
||||
--8<-- "docs/getting_started/installation/gpu/cuda.inc.md:pre-built-wheels"
|
||||
--8<-- "docs/getting_started/installation/gpu.cuda.inc.md:pre-built-wheels"
|
||||
|
||||
=== "AMD ROCm"
|
||||
|
||||
--8<-- "docs/getting_started/installation/gpu/rocm.inc.md:pre-built-wheels"
|
||||
--8<-- "docs/getting_started/installation/gpu.rocm.inc.md:pre-built-wheels"
|
||||
|
||||
=== "Intel XPU"
|
||||
|
||||
--8<-- "docs/getting_started/installation/gpu/xpu.inc.md:pre-built-wheels"
|
||||
--8<-- "docs/getting_started/installation/gpu.xpu.inc.md:pre-built-wheels"
|
||||
|
||||
[](){ #build-from-source }
|
||||
|
||||
@@ -72,15 +72,15 @@ vLLM is a Python library that supports the following GPU variants. Select your G
|
||||
|
||||
=== "NVIDIA CUDA"
|
||||
|
||||
--8<-- "docs/getting_started/installation/gpu/cuda.inc.md:build-wheel-from-source"
|
||||
--8<-- "docs/getting_started/installation/gpu.cuda.inc.md:build-wheel-from-source"
|
||||
|
||||
=== "AMD ROCm"
|
||||
|
||||
--8<-- "docs/getting_started/installation/gpu/rocm.inc.md:build-wheel-from-source"
|
||||
--8<-- "docs/getting_started/installation/gpu.rocm.inc.md:build-wheel-from-source"
|
||||
|
||||
=== "Intel XPU"
|
||||
|
||||
--8<-- "docs/getting_started/installation/gpu/xpu.inc.md:build-wheel-from-source"
|
||||
--8<-- "docs/getting_started/installation/gpu.xpu.inc.md:build-wheel-from-source"
|
||||
|
||||
## Set up using Docker
|
||||
|
||||
@@ -88,40 +88,40 @@ vLLM is a Python library that supports the following GPU variants. Select your G
|
||||
|
||||
=== "NVIDIA CUDA"
|
||||
|
||||
--8<-- "docs/getting_started/installation/gpu/cuda.inc.md:pre-built-images"
|
||||
--8<-- "docs/getting_started/installation/gpu.cuda.inc.md:pre-built-images"
|
||||
|
||||
=== "AMD ROCm"
|
||||
|
||||
--8<-- "docs/getting_started/installation/gpu/rocm.inc.md:pre-built-images"
|
||||
--8<-- "docs/getting_started/installation/gpu.rocm.inc.md:pre-built-images"
|
||||
|
||||
=== "Intel XPU"
|
||||
|
||||
--8<-- "docs/getting_started/installation/gpu/xpu.inc.md:pre-built-images"
|
||||
--8<-- "docs/getting_started/installation/gpu.xpu.inc.md:pre-built-images"
|
||||
|
||||
### Build image from source
|
||||
|
||||
=== "NVIDIA CUDA"
|
||||
|
||||
--8<-- "docs/getting_started/installation/gpu/cuda.inc.md:build-image-from-source"
|
||||
--8<-- "docs/getting_started/installation/gpu.cuda.inc.md:build-image-from-source"
|
||||
|
||||
=== "AMD ROCm"
|
||||
|
||||
--8<-- "docs/getting_started/installation/gpu/rocm.inc.md:build-image-from-source"
|
||||
--8<-- "docs/getting_started/installation/gpu.rocm.inc.md:build-image-from-source"
|
||||
|
||||
=== "Intel XPU"
|
||||
|
||||
--8<-- "docs/getting_started/installation/gpu/xpu.inc.md:build-image-from-source"
|
||||
--8<-- "docs/getting_started/installation/gpu.xpu.inc.md:build-image-from-source"
|
||||
|
||||
## Supported features
|
||||
|
||||
=== "NVIDIA CUDA"
|
||||
|
||||
--8<-- "docs/getting_started/installation/gpu/cuda.inc.md:supported-features"
|
||||
--8<-- "docs/getting_started/installation/gpu.cuda.inc.md:supported-features"
|
||||
|
||||
=== "AMD ROCm"
|
||||
|
||||
--8<-- "docs/getting_started/installation/gpu/rocm.inc.md:supported-features"
|
||||
--8<-- "docs/getting_started/installation/gpu.rocm.inc.md:supported-features"
|
||||
|
||||
=== "Intel XPU"
|
||||
|
||||
--8<-- "docs/getting_started/installation/gpu/xpu.inc.md:supported-features"
|
||||
--8<-- "docs/getting_started/installation/gpu.xpu.inc.md:supported-features"
|
||||
|
||||
@@ -146,7 +146,7 @@ Building the Docker image from source is the recommended way to use vLLM with RO
|
||||
|
||||
#### (Optional) Build an image with ROCm software stack
|
||||
|
||||
Build a docker image from <gh-file:docker/Dockerfile.rocm_base> which setup ROCm software stack needed by the vLLM.
|
||||
Build a docker image from [docker/Dockerfile.rocm_base](https://github.com/vllm-project/vllm/blob/main/docker/Dockerfile.rocm_base) which setup ROCm software stack needed by the vLLM.
|
||||
**This step is optional as this rocm_base image is usually prebuilt and store at [Docker Hub](https://hub.docker.com/r/rocm/vllm-dev) under tag `rocm/vllm-dev:base` to speed up user experience.**
|
||||
If you choose to build this rocm_base image yourself, the steps are as follows.
|
||||
|
||||
@@ -170,7 +170,7 @@ DOCKER_BUILDKIT=1 docker build \
|
||||
|
||||
#### Build an image with vLLM
|
||||
|
||||
First, build a docker image from <gh-file:docker/Dockerfile.rocm> and launch a docker container from the image.
|
||||
First, build a docker image from [docker/Dockerfile.rocm](https://github.com/vllm-project/vllm/blob/main/docker/Dockerfile.rocm) and launch a docker container from the image.
|
||||
It is important that the user kicks off the docker build using buildkit. Either the user put `DOCKER_BUILDKIT=1` as environment variable when calling docker build command, or the user needs to set up buildkit in the docker daemon configuration /etc/docker/daemon.json as follows and restart the daemon:
|
||||
|
||||
```bash
|
||||
@@ -181,10 +181,10 @@ It is important that the user kicks off the docker build using buildkit. Either
|
||||
}
|
||||
```
|
||||
|
||||
<gh-file:docker/Dockerfile.rocm> uses ROCm 6.3 by default, but also supports ROCm 5.7, 6.0, 6.1, and 6.2, in older vLLM branches.
|
||||
[docker/Dockerfile.rocm](https://github.com/vllm-project/vllm/blob/main/docker/Dockerfile.rocm) uses ROCm 6.3 by default, but also supports ROCm 5.7, 6.0, 6.1, and 6.2, in older vLLM branches.
|
||||
It provides flexibility to customize the build of docker image using the following arguments:
|
||||
|
||||
- `BASE_IMAGE`: specifies the base image used when running `docker build`. The default value `rocm/vllm-dev:base` is an image published and maintained by AMD. It is being built using <gh-file:docker/Dockerfile.rocm_base>
|
||||
- `BASE_IMAGE`: specifies the base image used when running `docker build`. The default value `rocm/vllm-dev:base` is an image published and maintained by AMD. It is being built using [docker/Dockerfile.rocm_base](https://github.com/vllm-project/vllm/blob/main/docker/Dockerfile.rocm_base)
|
||||
- `ARG_PYTORCH_ROCM_ARCH`: Allows to override the gfx architecture values from the base docker image
|
||||
|
||||
Their values can be passed in when running `docker build` with `--build-arg` options.
|
||||
@@ -75,7 +75,7 @@ vllm serve facebook/opt-13b \
|
||||
-tp=8
|
||||
```
|
||||
|
||||
By default, a ray instance will be launched automatically if no existing one is detected in the system, with `num-gpus` equals to `parallel_config.world_size`. We recommend properly starting a ray cluster before execution, referring to the <gh-file:examples/online_serving/run_cluster.sh> helper script.
|
||||
By default, a ray instance will be launched automatically if no existing one is detected in the system, with `num-gpus` equals to `parallel_config.world_size`. We recommend properly starting a ray cluster before execution, referring to the [examples/online_serving/run_cluster.sh](https://github.com/vllm-project/vllm/blob/main/examples/online_serving/run_cluster.sh) helper script.
|
||||
|
||||
# --8<-- [end:supported-features]
|
||||
# --8<-- [start:distributed-backend]
|
||||
Reference in New Issue
Block a user