[doc] update doc format (#20673)
Signed-off-by: reidliu41 <reid201711@gmail.com>
This commit is contained in:
@@ -16,11 +16,12 @@ by waiting for the next release or by implementing hacky workarounds in vLLM.
|
|||||||
The better solution is to test vLLM with PyTorch release candidates (RC) to ensure
|
The better solution is to test vLLM with PyTorch release candidates (RC) to ensure
|
||||||
compatibility before each release.
|
compatibility before each release.
|
||||||
|
|
||||||
PyTorch release candidates can be downloaded from PyTorch test index at https://download.pytorch.org/whl/test.
|
PyTorch release candidates can be downloaded from [PyTorch test index](https://download.pytorch.org/whl/test).
|
||||||
For example, torch2.7.0+cu12.8 RC can be installed using the following command:
|
For example, `torch2.7.0+cu12.8` RC can be installed using the following command:
|
||||||
|
|
||||||
```
|
```bash
|
||||||
uv pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/test/cu128
|
uv pip install torch torchvision torchaudio \
|
||||||
|
--index-url https://download.pytorch.org/whl/test/cu128
|
||||||
```
|
```
|
||||||
|
|
||||||
When the final RC is ready for testing, it will be announced to the community
|
When the final RC is ready for testing, it will be announced to the community
|
||||||
@@ -28,13 +29,28 @@ on the [PyTorch dev-discuss forum](https://dev-discuss.pytorch.org/c/release-ann
|
|||||||
After this announcement, we can begin testing vLLM integration by drafting a pull request
|
After this announcement, we can begin testing vLLM integration by drafting a pull request
|
||||||
following this 3-step process:
|
following this 3-step process:
|
||||||
|
|
||||||
1. Update requirements files in https://github.com/vllm-project/vllm/tree/main/requirements
|
1. Update [requirements files](https://github.com/vllm-project/vllm/tree/main/requirements)
|
||||||
to point to the new releases for torch, torchvision, and torchaudio.
|
to point to the new releases for `torch`, `torchvision`, and `torchaudio`.
|
||||||
2. Use `--extra-index-url https://download.pytorch.org/whl/test/<PLATFORM>` to
|
|
||||||
get the final release candidates' wheels. Some common platforms are `cpu`, `cu128`,
|
2. Use the following option to get the final release candidates' wheels. Some common platforms are `cpu`, `cu128`, and `rocm6.2.4`.
|
||||||
and `rocm6.2.4`.
|
|
||||||
3. As vLLM uses uv, make sure that `unsafe-best-match` strategy is set either
|
```bash
|
||||||
via `UV_INDEX_STRATEGY` env variable or via `--index-strategy unsafe-best-match`.
|
--extra-index-url https://download.pytorch.org/whl/test/<PLATFORM>
|
||||||
|
```
|
||||||
|
|
||||||
|
3. Since vLLM uses `uv`, ensure the following index strategy is applied:
|
||||||
|
|
||||||
|
- Via environment variable:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
export UV_INDEX_STRATEGY=unsafe-best-match
|
||||||
|
```
|
||||||
|
|
||||||
|
- Or via CLI flag:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
--index-strategy unsafe-best-match
|
||||||
|
```
|
||||||
|
|
||||||
If failures are found in the pull request, raise them as issues on vLLM and
|
If failures are found in the pull request, raise them as issues on vLLM and
|
||||||
cc the PyTorch release team to initiate discussion on how to address them.
|
cc the PyTorch release team to initiate discussion on how to address them.
|
||||||
@@ -42,20 +58,25 @@ cc the PyTorch release team to initiate discussion on how to address them.
|
|||||||
## Update CUDA version
|
## Update CUDA version
|
||||||
|
|
||||||
The PyTorch release matrix includes both stable and experimental [CUDA versions](https://github.com/pytorch/pytorch/blob/main/RELEASE.md#release-compatibility-matrix). Due to limitations, only the latest stable CUDA version (for example,
|
The PyTorch release matrix includes both stable and experimental [CUDA versions](https://github.com/pytorch/pytorch/blob/main/RELEASE.md#release-compatibility-matrix). Due to limitations, only the latest stable CUDA version (for example,
|
||||||
torch2.7.0+cu12.6) is uploaded to PyPI. However, vLLM may require a different CUDA version,
|
`torch2.7.0+cu12.6`) is uploaded to PyPI. However, vLLM may require a different CUDA version,
|
||||||
such as 12.8 for Blackwell support.
|
such as 12.8 for Blackwell support.
|
||||||
This complicates the process as we cannot use the out-of-the-box
|
This complicates the process as we cannot use the out-of-the-box
|
||||||
`pip install torch torchvision torchaudio` command. The solution is to use
|
`pip install torch torchvision torchaudio` command. The solution is to use
|
||||||
`--extra-index-url` in vLLM's Dockerfiles.
|
`--extra-index-url` in vLLM's Dockerfiles.
|
||||||
|
|
||||||
1. Use `--extra-index-url https://download.pytorch.org/whl/cu128` to install torch+cu128.
|
- Important indexes at the moment include:
|
||||||
2. Other important indexes at the moment include:
|
|
||||||
1. CPU ‒ https://download.pytorch.org/whl/cpu
|
| Platform | `--extra-index-url` |
|
||||||
2. ROCm ‒ https://download.pytorch.org/whl/rocm6.2.4 and https://download.pytorch.org/whl/rocm6.3
|
|----------|-----------------|
|
||||||
3. XPU ‒ https://download.pytorch.org/whl/xpu
|
| CUDA 12.8| [https://download.pytorch.org/whl/cu128](https://download.pytorch.org/whl/cu128)|
|
||||||
3. Update .buildkite/release-pipeline.yaml and .buildkite/scripts/upload-wheels.sh to
|
| CPU | [https://download.pytorch.org/whl/cpu](https://download.pytorch.org/whl/cpu)|
|
||||||
match the CUDA version from step 1. This makes sure that the release vLLM wheel is tested
|
| ROCm 6.2 | [https://download.pytorch.org/whl/rocm6.2.4](https://download.pytorch.org/whl/rocm6.2.4) |
|
||||||
on CI.
|
| ROCm 6.3 | [https://download.pytorch.org/whl/rocm6.3](https://download.pytorch.org/whl/rocm6.3) |
|
||||||
|
| XPU | [https://download.pytorch.org/whl/xpu](https://download.pytorch.org/whl/xpu) |
|
||||||
|
|
||||||
|
- Update the below files to match the CUDA version from step 1. This makes sure that the release vLLM wheel is tested on CI.
|
||||||
|
- `.buildkite/release-pipeline.yaml`
|
||||||
|
- `.buildkite/scripts/upload-wheels.sh`
|
||||||
|
|
||||||
## Address long vLLM build time
|
## Address long vLLM build time
|
||||||
|
|
||||||
@@ -66,7 +87,7 @@ it doesn't populate the cache, so re-running it to warm up the cache
|
|||||||
is ineffective.
|
is ineffective.
|
||||||
|
|
||||||
While ongoing efforts like [#17419](gh-issue:17419)
|
While ongoing efforts like [#17419](gh-issue:17419)
|
||||||
address the long build time at its source, the current workaround is to set VLLM_CI_BRANCH
|
address the long build time at its source, the current workaround is to set `VLLM_CI_BRANCH`
|
||||||
to a custom branch provided by @khluu (`VLLM_CI_BRANCH=khluu/use_postmerge_q`)
|
to a custom branch provided by @khluu (`VLLM_CI_BRANCH=khluu/use_postmerge_q`)
|
||||||
when manually triggering a build on Buildkite. This branch accomplishes two things:
|
when manually triggering a build on Buildkite. This branch accomplishes two things:
|
||||||
|
|
||||||
@@ -86,17 +107,18 @@ releases (which would take too much time), they can be built from
|
|||||||
source to unblock the update process.
|
source to unblock the update process.
|
||||||
|
|
||||||
### FlashInfer
|
### FlashInfer
|
||||||
Here is how to build and install it from source with torch2.7.0+cu128 in vLLM [Dockerfile](https://github.com/vllm-project/vllm/blob/27bebcd89792d5c4b08af7a65095759526f2f9e1/docker/Dockerfile#L259-L271):
|
Here is how to build and install it from source with `torch2.7.0+cu128` in vLLM [Dockerfile](https://github.com/vllm-project/vllm/blob/27bebcd89792d5c4b08af7a65095759526f2f9e1/docker/Dockerfile#L259-L271):
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
export TORCH_CUDA_ARCH_LIST='7.5 8.0 8.9 9.0 10.0+PTX'
|
export TORCH_CUDA_ARCH_LIST='7.5 8.0 8.9 9.0 10.0+PTX'
|
||||||
export FLASHINFER_ENABLE_SM90=1
|
export FLASHINFER_ENABLE_SM90=1
|
||||||
uv pip install --system --no-build-isolation "git+https://github.com/flashinfer-ai/flashinfer@v0.2.6.post1"
|
uv pip install --system \
|
||||||
|
--no-build-isolation "git+https://github.com/flashinfer-ai/flashinfer@v0.2.6.post1"
|
||||||
```
|
```
|
||||||
|
|
||||||
One caveat is that building FlashInfer from source adds approximately 30
|
One caveat is that building FlashInfer from source adds approximately 30
|
||||||
minutes to the vLLM build time. Therefore, it's preferable to cache the wheel in a
|
minutes to the vLLM build time. Therefore, it's preferable to cache the wheel in a
|
||||||
public location for immediate installation, such as https://download.pytorch.org/whl/cu128/flashinfer/flashinfer_python-0.2.6.post1%2Bcu128torch2.7-cp39-abi3-linux_x86_64.whl. For future releases, contact the PyTorch release
|
public location for immediate installation, such as [this FlashInfer wheel link](https://download.pytorch.org/whl/cu128/flashinfer/flashinfer_python-0.2.6.post1%2Bcu128torch2.7-cp39-abi3-linux_x86_64.whl). For future releases, contact the PyTorch release
|
||||||
team if you want to get the package published there.
|
team if you want to get the package published there.
|
||||||
|
|
||||||
### xFormers
|
### xFormers
|
||||||
@@ -104,13 +126,15 @@ Similar to FlashInfer, here is how to build and install xFormers from source:
|
|||||||
|
|
||||||
```bash
|
```bash
|
||||||
export TORCH_CUDA_ARCH_LIST='7.0 7.5 8.0 8.9 9.0 10.0+PTX'
|
export TORCH_CUDA_ARCH_LIST='7.0 7.5 8.0 8.9 9.0 10.0+PTX'
|
||||||
MAX_JOBS=16 uv pip install --system --no-build-isolation "git+https://github.com/facebookresearch/xformers@v0.0.30"
|
MAX_JOBS=16 uv pip install --system \
|
||||||
|
--no-build-isolation "git+https://github.com/facebookresearch/xformers@v0.0.30"
|
||||||
```
|
```
|
||||||
|
|
||||||
### Mamba
|
### Mamba
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
uv pip install --system --no-build-isolation "git+https://github.com/state-spaces/mamba@v2.2.4"
|
uv pip install --system \
|
||||||
|
--no-build-isolation "git+https://github.com/state-spaces/mamba@v2.2.4"
|
||||||
```
|
```
|
||||||
|
|
||||||
### causal-conv1d
|
### causal-conv1d
|
||||||
@@ -125,6 +149,6 @@ Rather than attempting to update all vLLM platforms in a single pull request, it
|
|||||||
to handle some platforms separately. The separation of requirements and Dockerfiles
|
to handle some platforms separately. The separation of requirements and Dockerfiles
|
||||||
for different platforms in vLLM CI/CD allows us to selectively choose
|
for different platforms in vLLM CI/CD allows us to selectively choose
|
||||||
which platforms to update. For instance, updating XPU requires the corresponding
|
which platforms to update. For instance, updating XPU requires the corresponding
|
||||||
release from https://github.com/intel/intel-extension-for-pytorch by Intel.
|
release from [Intel Extension for PyTorch](https://github.com/intel/intel-extension-for-pytorch) by Intel.
|
||||||
While <gh-pr:16859> updated vLLM to PyTorch 2.7.0 on CPU, CUDA, and ROCm,
|
While <gh-pr:16859> updated vLLM to PyTorch 2.7.0 on CPU, CUDA, and ROCm,
|
||||||
<gh-pr:17444> completed the update for XPU.
|
<gh-pr:17444> completed the update for XPU.
|
||||||
|
|||||||
Reference in New Issue
Block a user