[doc] improve readability (#18675)
Signed-off-by: reidliu41 <reid201711@gmail.com> Co-authored-by: reidliu41 <reid201711@gmail.com>
This commit is contained in:
@@ -45,7 +45,15 @@ Use the following commands to run a Docker image:
|
||||
|
||||
```console
|
||||
docker pull vault.habana.ai/gaudi-docker/1.18.0/ubuntu22.04/habanalabs/pytorch-installer-2.4.0:latest
|
||||
docker run -it --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --net=host --ipc=host vault.habana.ai/gaudi-docker/1.18.0/ubuntu22.04/habanalabs/pytorch-installer-2.4.0:latest
|
||||
docker run \
|
||||
-it \
|
||||
--runtime=habana \
|
||||
-e HABANA_VISIBLE_DEVICES=all \
|
||||
-e OMPI_MCA_btl_vader_single_copy_mechanism=none \
|
||||
--cap-add=sys_nice \
|
||||
--net=host \
|
||||
--ipc=host \
|
||||
vault.habana.ai/gaudi-docker/1.18.0/ubuntu22.04/habanalabs/pytorch-installer-2.4.0:latest
|
||||
```
|
||||
|
||||
# --8<-- [end:requirements]
|
||||
@@ -91,7 +99,14 @@ Currently, there are no pre-built Intel Gaudi images.
|
||||
|
||||
```console
|
||||
docker build -f docker/Dockerfile.hpu -t vllm-hpu-env .
|
||||
docker run -it --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --net=host --rm vllm-hpu-env
|
||||
docker run \
|
||||
-it \
|
||||
--runtime=habana \
|
||||
-e HABANA_VISIBLE_DEVICES=all \
|
||||
-e OMPI_MCA_btl_vader_single_copy_mechanism=none \
|
||||
--cap-add=sys_nice \
|
||||
--net=host \
|
||||
--rm vllm-hpu-env
|
||||
```
|
||||
|
||||
!!! tip
|
||||
|
||||
@@ -38,7 +38,8 @@ The installation of drivers and tools wouldn't be necessary, if [Deep Learning A
|
||||
sudo tee /etc/apt/sources.list.d/neuron.list > /dev/null <<EOF
|
||||
deb https://apt.repos.neuron.amazonaws.com ${VERSION_CODENAME} main
|
||||
EOF
|
||||
wget -qO - https://apt.repos.neuron.amazonaws.com/GPG-PUB-KEY-AMAZON-AWS-NEURON.PUB | sudo apt-key add -
|
||||
wget -qO - https://apt.repos.neuron.amazonaws.com/GPG-PUB-KEY-AMAZON-AWS-NEURON.PUB \
|
||||
| sudo apt-key add -
|
||||
|
||||
# Update OS packages
|
||||
sudo apt-get update -y
|
||||
@@ -96,12 +97,17 @@ source aws_neuron_venv_pytorch/bin/activate
|
||||
|
||||
# Install Jupyter notebook kernel
|
||||
pip install ipykernel
|
||||
python3.10 -m ipykernel install --user --name aws_neuron_venv_pytorch --display-name "Python (torch-neuronx)"
|
||||
python3.10 -m ipykernel install \
|
||||
--user \
|
||||
--name aws_neuron_venv_pytorch \
|
||||
--display-name "Python (torch-neuronx)"
|
||||
pip install jupyter notebook
|
||||
pip install environment_kernels
|
||||
|
||||
# Set pip repository pointing to the Neuron repository
|
||||
python -m pip config set global.extra-index-url https://pip.repos.neuron.amazonaws.com
|
||||
python -m pip config set \
|
||||
global.extra-index-url \
|
||||
https://pip.repos.neuron.amazonaws.com
|
||||
|
||||
# Install wget, awscli
|
||||
python -m pip install wget
|
||||
|
||||
@@ -55,7 +55,9 @@ LLM inference is a fast-evolving field, and the latest code may contain bug fixe
|
||||
##### Install the latest code using `pip`
|
||||
|
||||
```console
|
||||
pip install -U vllm --pre --extra-index-url https://wheels.vllm.ai/nightly
|
||||
pip install -U vllm \
|
||||
--pre \
|
||||
--extra-index-url https://wheels.vllm.ai/nightly
|
||||
```
|
||||
|
||||
`--pre` is required for `pip` to consider pre-released versions.
|
||||
@@ -63,7 +65,9 @@ pip install -U vllm --pre --extra-index-url https://wheels.vllm.ai/nightly
|
||||
Another way to install the latest code is to use `uv`:
|
||||
|
||||
```console
|
||||
uv pip install -U vllm --torch-backend=auto --extra-index-url https://wheels.vllm.ai/nightly
|
||||
uv pip install -U vllm \
|
||||
--torch-backend=auto \
|
||||
--extra-index-url https://wheels.vllm.ai/nightly
|
||||
```
|
||||
|
||||
##### Install specific revisions using `pip`
|
||||
@@ -83,7 +87,9 @@ If you want to access the wheels for previous commits (e.g. to bisect the behavi
|
||||
|
||||
```console
|
||||
export VLLM_COMMIT=72d9c316d3f6ede485146fe5aabd4e61dbc59069 # use full commit hash from the main branch
|
||||
uv pip install vllm --torch-backend=auto --extra-index-url https://wheels.vllm.ai/${VLLM_COMMIT}
|
||||
uv pip install vllm \
|
||||
--torch-backend=auto \
|
||||
--extra-index-url https://wheels.vllm.ai/${VLLM_COMMIT}
|
||||
```
|
||||
|
||||
The `uv` approach works for vLLM `v0.6.6` and later and offers an easy-to-remember command. A unique feature of `uv` is that packages in `--extra-index-url` have [higher priority than the default index](https://docs.astral.sh/uv/pip/compatibility/#packages-that-exist-on-multiple-indexes). If the latest public release is `v0.6.6.post1`, `uv`'s behavior allows installing a commit before `v0.6.6.post1` by specifying the `--extra-index-url`. In contrast, `pip` combines packages from `--extra-index-url` and the default index, choosing only the latest version, which makes it difficult to install a development version prior to the released version.
|
||||
@@ -192,7 +198,11 @@ Additionally, if you have trouble building vLLM, we recommend using the NVIDIA P
|
||||
|
||||
```console
|
||||
# Use `--ipc=host` to make sure the shared memory is large enough.
|
||||
docker run --gpus all -it --rm --ipc=host nvcr.io/nvidia/pytorch:23.10-py3
|
||||
docker run \
|
||||
--gpus all \
|
||||
-it \
|
||||
--rm \
|
||||
--ipc=host nvcr.io/nvidia/pytorch:23.10-py3
|
||||
```
|
||||
|
||||
If you don't want to use docker, it is recommended to have a full installation of CUDA Toolkit. You can download and install it from [the official website](https://developer.nvidia.com/cuda-toolkit-archive). After installation, set the environment variable `CUDA_HOME` to the installation path of CUDA Toolkit, and make sure that the `nvcc` compiler is in your `PATH`, e.g.:
|
||||
|
||||
@@ -91,19 +91,22 @@ Currently, there are no pre-built ROCm wheels.
|
||||
4. Build vLLM. For example, vLLM on ROCM 6.3 can be built with the following steps:
|
||||
|
||||
```bash
|
||||
$ pip install --upgrade pip
|
||||
pip install --upgrade pip
|
||||
|
||||
# Build & install AMD SMI
|
||||
$ pip install /opt/rocm/share/amd_smi
|
||||
pip install /opt/rocm/share/amd_smi
|
||||
|
||||
# Install dependencies
|
||||
$ pip install --upgrade numba scipy huggingface-hub[cli,hf_transfer] setuptools_scm
|
||||
$ pip install "numpy<2"
|
||||
$ pip install -r requirements/rocm.txt
|
||||
pip install --upgrade numba \
|
||||
scipy \
|
||||
huggingface-hub[cli,hf_transfer] \
|
||||
setuptools_scm
|
||||
pip install "numpy<2"
|
||||
pip install -r requirements/rocm.txt
|
||||
|
||||
# Build vLLM for MI210/MI250/MI300.
|
||||
$ export PYTORCH_ROCM_ARCH="gfx90a;gfx942"
|
||||
$ python3 setup.py develop
|
||||
export PYTORCH_ROCM_ARCH="gfx90a;gfx942"
|
||||
python3 setup.py develop
|
||||
```
|
||||
|
||||
This may take 5-10 minutes. Currently, `pip install .` does not work for ROCm installation.
|
||||
@@ -154,7 +157,9 @@ It is important that the user kicks off the docker build using buildkit. Either
|
||||
To build vllm on ROCm 6.3 for MI200 and MI300 series, you can use the default:
|
||||
|
||||
```console
|
||||
DOCKER_BUILDKIT=1 docker build -f docker/Dockerfile.rocm_base -t rocm/vllm-dev:base .
|
||||
DOCKER_BUILDKIT=1 docker build \
|
||||
-f docker/Dockerfile.rocm_base \
|
||||
-t rocm/vllm-dev:base .
|
||||
```
|
||||
|
||||
#### Build an image with vLLM
|
||||
@@ -189,7 +194,11 @@ DOCKER_BUILDKIT=1 docker build -f docker/Dockerfile.rocm -t vllm-rocm .
|
||||
To build vllm on ROCm 6.3 for Radeon RX7900 series (gfx1100), you should pick the alternative base image:
|
||||
|
||||
```console
|
||||
DOCKER_BUILDKIT=1 docker build --build-arg BASE_IMAGE="rocm/vllm-dev:navi_base" -f docker/Dockerfile.rocm -t vllm-rocm .
|
||||
DOCKER_BUILDKIT=1 docker build \
|
||||
--build-arg BASE_IMAGE="rocm/vllm-dev:navi_base" \
|
||||
-f docker/Dockerfile.rocm \
|
||||
-t vllm-rocm \
|
||||
.
|
||||
```
|
||||
|
||||
To run the above docker image `vllm-rocm`, use the below command:
|
||||
|
||||
Reference in New Issue
Block a user