[Doc] Convert docs to use colon fences (#12471)

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
This commit is contained in:
Harry Mellor
2025-01-29 03:38:29 +00:00
committed by GitHub
parent a7e3eba66f
commit dd6a3a02cb
68 changed files with 2352 additions and 2341 deletions

View File

@@ -10,9 +10,9 @@ vLLM contains pre-compiled C++ and CUDA (12.1) binaries.
### Create a new Python environment
```{note}
:::{note}
PyTorch installed via `conda` will statically link `NCCL` library, which can cause issues when vLLM tries to use `NCCL`. See <gh-issue:8420> for more details.
```
:::
In order to be performant, vLLM has to compile many cuda kernels. The compilation unfortunately introduces binary incompatibility with other CUDA versions and PyTorch versions, even for the same PyTorch version with different building configurations.
@@ -100,10 +100,10 @@ pip install --editable .
You can find more information about vLLM's wheels in <project:#install-the-latest-code>.
```{note}
:::{note}
There is a possibility that your source code may have a different commit ID compared to the latest vLLM wheel, which could potentially lead to unknown errors.
It is recommended to use the same commit ID for the source code as the vLLM wheel you have installed. Please refer to <project:#install-the-latest-code> for instructions on how to install a specified wheel.
```
:::
#### Full build (with compilation)
@@ -115,7 +115,7 @@ cd vllm
pip install -e .
```
```{tip}
:::{tip}
Building from source requires a lot of compilation. If you are building from source repeatedly, it's more efficient to cache the compilation results.
For example, you can install [ccache](https://github.com/ccache/ccache) using `conda install ccache` or `apt install ccache` .
@@ -123,7 +123,7 @@ As long as `which ccache` command can find the `ccache` binary, it will be used
[sccache](https://github.com/mozilla/sccache) works similarly to `ccache`, but has the capability to utilize caching in remote storage environments.
The following environment variables can be set to configure the vLLM `sccache` remote: `SCCACHE_BUCKET=vllm-build-sccache SCCACHE_REGION=us-west-2 SCCACHE_S3_NO_CREDENTIALS=1`. We also recommend setting `SCCACHE_IDLE_TIMEOUT=0`.
```
:::
##### Use an existing PyTorch installation

View File

@@ -2,299 +2,299 @@
vLLM is a Python library that supports the following GPU variants. Select your GPU type to see vendor specific instructions:
::::{tab-set}
:::::{tab-set}
:sync-group: device
:::{tab-item} CUDA
::::{tab-item} CUDA
:sync: cuda
```{include} cuda.inc.md
:::{include} cuda.inc.md
:start-after: "# Installation"
:end-before: "## Requirements"
```
:::
:::{tab-item} ROCm
:sync: rocm
```{include} rocm.inc.md
:start-after: "# Installation"
:end-before: "## Requirements"
```
:::
:::{tab-item} XPU
:sync: xpu
```{include} xpu.inc.md
:start-after: "# Installation"
:end-before: "## Requirements"
```
:::
::::
::::{tab-item} ROCm
:sync: rocm
:::{include} rocm.inc.md
:start-after: "# Installation"
:end-before: "## Requirements"
:::
::::
::::{tab-item} XPU
:sync: xpu
:::{include} xpu.inc.md
:start-after: "# Installation"
:end-before: "## Requirements"
:::
::::
:::::
## Requirements
- OS: Linux
- Python: 3.9 -- 3.12
::::{tab-set}
:::::{tab-set}
:sync-group: device
:::{tab-item} CUDA
::::{tab-item} CUDA
:sync: cuda
```{include} cuda.inc.md
:::{include} cuda.inc.md
:start-after: "## Requirements"
:end-before: "## Set up using Python"
```
:::
:::{tab-item} ROCm
:sync: rocm
```{include} rocm.inc.md
:start-after: "## Requirements"
:end-before: "## Set up using Python"
```
:::
:::{tab-item} XPU
:sync: xpu
```{include} xpu.inc.md
:start-after: "## Requirements"
:end-before: "## Set up using Python"
```
:::
::::
::::{tab-item} ROCm
:sync: rocm
:::{include} rocm.inc.md
:start-after: "## Requirements"
:end-before: "## Set up using Python"
:::
::::
::::{tab-item} XPU
:sync: xpu
:::{include} xpu.inc.md
:start-after: "## Requirements"
:end-before: "## Set up using Python"
:::
::::
:::::
## Set up using Python
### Create a new Python environment
```{include} ../python_env_setup.inc.md
```
::::{tab-set}
:sync-group: device
:::{tab-item} CUDA
:sync: cuda
```{include} cuda.inc.md
:start-after: "## Create a new Python environment"
:end-before: "### Pre-built wheels"
```
:::{include} ../python_env_setup.inc.md
:::
:::{tab-item} ROCm
:::::{tab-set}
:sync-group: device
::::{tab-item} CUDA
:sync: cuda
:::{include} cuda.inc.md
:start-after: "## Create a new Python environment"
:end-before: "### Pre-built wheels"
:::
::::
::::{tab-item} ROCm
:sync: rocm
There is no extra information on creating a new Python environment for this device.
:::
::::
:::{tab-item} XPU
::::{tab-item} XPU
:sync: xpu
There is no extra information on creating a new Python environment for this device.
:::
::::
:::::
### Pre-built wheels
::::{tab-set}
:::::{tab-set}
:sync-group: device
:::{tab-item} CUDA
::::{tab-item} CUDA
:sync: cuda
```{include} cuda.inc.md
:::{include} cuda.inc.md
:start-after: "### Pre-built wheels"
:end-before: "### Build wheel from source"
```
:::
:::{tab-item} ROCm
:sync: rocm
```{include} rocm.inc.md
:start-after: "### Pre-built wheels"
:end-before: "### Build wheel from source"
```
:::
:::{tab-item} XPU
:sync: xpu
```{include} xpu.inc.md
:start-after: "### Pre-built wheels"
:end-before: "### Build wheel from source"
```
:::
::::
::::{tab-item} ROCm
:sync: rocm
:::{include} rocm.inc.md
:start-after: "### Pre-built wheels"
:end-before: "### Build wheel from source"
:::
::::
::::{tab-item} XPU
:sync: xpu
:::{include} xpu.inc.md
:start-after: "### Pre-built wheels"
:end-before: "### Build wheel from source"
:::
::::
:::::
(build-from-source)=
### Build wheel from source
::::{tab-set}
:::::{tab-set}
:sync-group: device
:::{tab-item} CUDA
::::{tab-item} CUDA
:sync: cuda
```{include} cuda.inc.md
:::{include} cuda.inc.md
:start-after: "### Build wheel from source"
:end-before: "## Set up using Docker"
```
:::
:::{tab-item} ROCm
:sync: rocm
```{include} rocm.inc.md
:start-after: "### Build wheel from source"
:end-before: "## Set up using Docker"
```
:::
:::{tab-item} XPU
:sync: xpu
```{include} xpu.inc.md
:start-after: "### Build wheel from source"
:end-before: "## Set up using Docker"
```
:::
::::
::::{tab-item} ROCm
:sync: rocm
:::{include} rocm.inc.md
:start-after: "### Build wheel from source"
:end-before: "## Set up using Docker"
:::
::::
::::{tab-item} XPU
:sync: xpu
:::{include} xpu.inc.md
:start-after: "### Build wheel from source"
:end-before: "## Set up using Docker"
:::
::::
:::::
## Set up using Docker
### Pre-built images
::::{tab-set}
:::::{tab-set}
:sync-group: device
:::{tab-item} CUDA
::::{tab-item} CUDA
:sync: cuda
```{include} cuda.inc.md
:::{include} cuda.inc.md
:start-after: "### Pre-built images"
:end-before: "### Build image from source"
```
:::
:::{tab-item} ROCm
:sync: rocm
```{include} rocm.inc.md
:start-after: "### Pre-built images"
:end-before: "### Build image from source"
```
:::
:::{tab-item} XPU
:sync: xpu
```{include} xpu.inc.md
:start-after: "### Pre-built images"
:end-before: "### Build image from source"
```
:::
::::
::::{tab-item} ROCm
:sync: rocm
:::{include} rocm.inc.md
:start-after: "### Pre-built images"
:end-before: "### Build image from source"
:::
::::
::::{tab-item} XPU
:sync: xpu
:::{include} xpu.inc.md
:start-after: "### Pre-built images"
:end-before: "### Build image from source"
:::
::::
:::::
### Build image from source
::::{tab-set}
:::::{tab-set}
:sync-group: device
:::{tab-item} CUDA
::::{tab-item} CUDA
:sync: cuda
```{include} cuda.inc.md
:::{include} cuda.inc.md
:start-after: "### Build image from source"
:end-before: "## Supported features"
```
:::
:::{tab-item} ROCm
:sync: rocm
```{include} rocm.inc.md
:start-after: "### Build image from source"
:end-before: "## Supported features"
```
:::
:::{tab-item} XPU
:sync: xpu
```{include} xpu.inc.md
:start-after: "### Build image from source"
:end-before: "## Supported features"
```
:::
::::
::::{tab-item} ROCm
:sync: rocm
:::{include} rocm.inc.md
:start-after: "### Build image from source"
:end-before: "## Supported features"
:::
::::
::::{tab-item} XPU
:sync: xpu
:::{include} xpu.inc.md
:start-after: "### Build image from source"
:end-before: "## Supported features"
:::
::::
:::::
## Supported features
::::{tab-set}
:::::{tab-set}
:sync-group: device
:::{tab-item} CUDA
::::{tab-item} CUDA
:sync: cuda
```{include} cuda.inc.md
:::{include} cuda.inc.md
:start-after: "## Supported features"
```
:::
:::{tab-item} ROCm
:sync: rocm
```{include} rocm.inc.md
:start-after: "## Supported features"
```
:::
:::{tab-item} XPU
:sync: xpu
```{include} xpu.inc.md
:start-after: "## Supported features"
```
:::
::::
::::{tab-item} ROCm
:sync: rocm
:::{include} rocm.inc.md
:start-after: "## Supported features"
:::
::::
::::{tab-item} XPU
:sync: xpu
:::{include} xpu.inc.md
:start-after: "## Supported features"
:::
::::
:::::

View File

@@ -16,10 +16,10 @@ Currently, there are no pre-built ROCm wheels.
However, the [AMD Infinity hub for vLLM](https://hub.docker.com/r/rocm/vllm/tags) offers a prebuilt, optimized
docker image designed for validating inference performance on the AMD Instinct™ MI300X accelerator.
```{tip}
:::{tip}
Please check [LLM inference performance validation on AMD Instinct MI300X](https://rocm.docs.amd.com/en/latest/how-to/performance-validation/mi300x/vllm-benchmark.html)
for instructions on how to use this prebuilt docker image.
```
:::
### Build wheel from source
@@ -47,9 +47,9 @@ for instructions on how to use this prebuilt docker image.
cd ../..
```
```{note}
- If you see HTTP issue related to downloading packages during building triton, please try again as the HTTP error is intermittent.
```
:::{note}
If you see HTTP issue related to downloading packages during building triton, please try again as the HTTP error is intermittent.
:::
2. Optionally, if you choose to use CK flash attention, you can install [flash attention for ROCm](https://github.com/ROCm/flash-attention/tree/ck_tile)
@@ -67,9 +67,9 @@ for instructions on how to use this prebuilt docker image.
cd ..
```
```{note}
- You might need to downgrade the "ninja" version to 1.10 it is not used when compiling flash-attention-2 (e.g. `pip install ninja==1.10.2.4`)
```
:::{note}
You might need to downgrade the "ninja" version to 1.10 it is not used when compiling flash-attention-2 (e.g. `pip install ninja==1.10.2.4`)
:::
3. Build vLLM. For example, vLLM on ROCM 6.2 can be built with the following steps:
@@ -95,17 +95,18 @@ for instructions on how to use this prebuilt docker image.
This may take 5-10 minutes. Currently, `pip install .` does not work for ROCm installation.
```{tip}
<!--- pyml disable-num-lines 5 ul-indent-->
:::{tip}
- Triton flash attention is used by default. For benchmarking purposes, it is recommended to run a warm up step before collecting perf numbers.
- Triton flash attention does not currently support sliding window attention. If using half precision, please use CK flash-attention for sliding window support.
- To use CK flash-attention or PyTorch naive attention, please use this flag `export VLLM_USE_TRITON_FLASH_ATTN=0` to turn off triton flash attention.
- The ROCm version of PyTorch, ideally, should match the ROCm driver version.
```
:::
```{tip}
:::{tip}
- For MI300x (gfx942) users, to achieve optimal performance, please refer to [MI300x tuning guide](https://rocm.docs.amd.com/en/latest/how-to/tuning-guides/mi300x/index.html) for performance optimization and tuning tips on system and workflow level.
For vLLM, please refer to [vLLM performance optimization](https://rocm.docs.amd.com/en/latest/how-to/tuning-guides/mi300x/workload.html#vllm-performance-optimization).
```
:::
## Set up using Docker

View File

@@ -30,10 +30,10 @@ pip install -v -r requirements-xpu.txt
VLLM_TARGET_DEVICE=xpu python setup.py install
```
```{note}
:::{note}
- FP16 is the default data type in the current XPU backend. The BF16 data
type will be supported in the future.
```
:::
## Set up using Docker