Migrate docs from Sphinx to MkDocs (#18145)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
This commit is contained in:
67
docs/getting_started/installation/cpu/apple.inc.md
Normal file
67
docs/getting_started/installation/cpu/apple.inc.md
Normal file
@@ -0,0 +1,67 @@
|
||||
# --8<-- [start:installation]
|
||||
|
||||
vLLM has experimental support for macOS with Apple silicon. For now, users shall build from the source vLLM to natively run on macOS.
|
||||
|
||||
Currently the CPU implementation for macOS supports FP32 and FP16 datatypes.
|
||||
|
||||
!!! warning
|
||||
There are no pre-built wheels or images for this device, so you must build vLLM from source.
|
||||
|
||||
# --8<-- [end:installation]
|
||||
# --8<-- [start:requirements]
|
||||
|
||||
- OS: `macOS Sonoma` or later
|
||||
- SDK: `XCode 15.4` or later with Command Line Tools
|
||||
- Compiler: `Apple Clang >= 15.0.0`
|
||||
|
||||
# --8<-- [end:requirements]
|
||||
# --8<-- [start:set-up-using-python]
|
||||
|
||||
# --8<-- [end:set-up-using-python]
|
||||
# --8<-- [start:pre-built-wheels]
|
||||
|
||||
# --8<-- [end:pre-built-wheels]
|
||||
# --8<-- [start:build-wheel-from-source]
|
||||
|
||||
After installation of XCode and the Command Line Tools, which include Apple Clang, execute the following commands to build and install vLLM from the source.
|
||||
|
||||
```console
|
||||
git clone https://github.com/vllm-project/vllm.git
|
||||
cd vllm
|
||||
pip install -r requirements/cpu.txt
|
||||
pip install -e .
|
||||
```
|
||||
|
||||
!!! note
|
||||
On macOS the `VLLM_TARGET_DEVICE` is automatically set to `cpu`, which currently is the only supported device.
|
||||
|
||||
#### Troubleshooting
|
||||
|
||||
If the build has error like the following snippet where standard C++ headers cannot be found, try to remove and reinstall your
|
||||
[Command Line Tools for Xcode](https://developer.apple.com/download/all/).
|
||||
|
||||
```text
|
||||
[...] fatal error: 'map' file not found
|
||||
1 | #include <map>
|
||||
| ^~~~~
|
||||
1 error generated.
|
||||
[2/8] Building CXX object CMakeFiles/_C.dir/csrc/cpu/pos_encoding.cpp.o
|
||||
|
||||
[...] fatal error: 'cstddef' file not found
|
||||
10 | #include <cstddef>
|
||||
| ^~~~~~~~~
|
||||
1 error generated.
|
||||
```
|
||||
|
||||
# --8<-- [end:build-wheel-from-source]
|
||||
# --8<-- [start:set-up-using-docker]
|
||||
|
||||
# --8<-- [end:set-up-using-docker]
|
||||
# --8<-- [start:pre-built-images]
|
||||
|
||||
# --8<-- [end:pre-built-images]
|
||||
# --8<-- [start:build-image-from-source]
|
||||
|
||||
# --8<-- [end:build-image-from-source]
|
||||
# --8<-- [start:extra-information]
|
||||
# --8<-- [end:extra-information]
|
||||
41
docs/getting_started/installation/cpu/arm.inc.md
Normal file
41
docs/getting_started/installation/cpu/arm.inc.md
Normal file
@@ -0,0 +1,41 @@
|
||||
# --8<-- [start:installation]
|
||||
|
||||
vLLM has been adapted to work on ARM64 CPUs with NEON support, leveraging the CPU backend initially developed for the x86 platform.
|
||||
|
||||
ARM CPU backend currently supports Float32, FP16 and BFloat16 datatypes.
|
||||
|
||||
!!! warning
|
||||
There are no pre-built wheels or images for this device, so you must build vLLM from source.
|
||||
|
||||
# --8<-- [end:installation]
|
||||
# --8<-- [start:requirements]
|
||||
|
||||
- OS: Linux
|
||||
- Compiler: `gcc/g++ >= 12.3.0` (optional, recommended)
|
||||
- Instruction Set Architecture (ISA): NEON support is required
|
||||
|
||||
# --8<-- [end:requirements]
|
||||
# --8<-- [start:set-up-using-python]
|
||||
|
||||
# --8<-- [end:set-up-using-python]
|
||||
# --8<-- [start:pre-built-wheels]
|
||||
|
||||
# --8<-- [end:pre-built-wheels]
|
||||
# --8<-- [start:build-wheel-from-source]
|
||||
|
||||
--8<-- "docs/getting_started/installation/cpu/cpu/build.inc.md"
|
||||
|
||||
Testing has been conducted on AWS Graviton3 instances for compatibility.
|
||||
|
||||
# --8<-- [end:build-wheel-from-source]
|
||||
# --8<-- [start:set-up-using-docker]
|
||||
|
||||
# --8<-- [end:set-up-using-docker]
|
||||
# --8<-- [start:pre-built-images]
|
||||
|
||||
# --8<-- [end:pre-built-images]
|
||||
# --8<-- [start:build-image-from-source]
|
||||
|
||||
# --8<-- [end:build-image-from-source]
|
||||
# --8<-- [start:extra-information]
|
||||
# --8<-- [end:extra-information]
|
||||
36
docs/getting_started/installation/cpu/build.inc.md
Normal file
36
docs/getting_started/installation/cpu/build.inc.md
Normal file
@@ -0,0 +1,36 @@
|
||||
First, install recommended compiler. We recommend to use `gcc/g++ >= 12.3.0` as the default compiler to avoid potential problems. For example, on Ubuntu 22.4, you can run:
|
||||
|
||||
```console
|
||||
sudo apt-get update -y
|
||||
sudo apt-get install -y gcc-12 g++-12 libnuma-dev python3-dev
|
||||
sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-12 10 --slave /usr/bin/g++ g++ /usr/bin/g++-12
|
||||
```
|
||||
|
||||
Second, clone vLLM project:
|
||||
|
||||
```console
|
||||
git clone https://github.com/vllm-project/vllm.git vllm_source
|
||||
cd vllm_source
|
||||
```
|
||||
|
||||
Third, install Python packages for vLLM CPU backend building:
|
||||
|
||||
```console
|
||||
pip install --upgrade pip
|
||||
pip install "cmake>=3.26" wheel packaging ninja "setuptools-scm>=8" numpy
|
||||
pip install -v -r requirements/cpu.txt --extra-index-url https://download.pytorch.org/whl/cpu
|
||||
```
|
||||
|
||||
Finally, build and install vLLM CPU backend:
|
||||
|
||||
```console
|
||||
VLLM_TARGET_DEVICE=cpu python setup.py install
|
||||
```
|
||||
|
||||
If you want to develop vllm, install it in editable mode instead.
|
||||
|
||||
```console
|
||||
VLLM_TARGET_DEVICE=cpu python setup.py develop
|
||||
```
|
||||
|
||||
# --8<-- [end:extra-information]
|
||||
69
docs/getting_started/installation/cpu/s390x.inc.md
Normal file
69
docs/getting_started/installation/cpu/s390x.inc.md
Normal file
@@ -0,0 +1,69 @@
|
||||
# --8<-- [start:installation]
|
||||
|
||||
vLLM has experimental support for s390x architecture on IBM Z platform. For now, users shall build from the vLLM source to natively run on IBM Z platform.
|
||||
|
||||
Currently the CPU implementation for s390x architecture supports FP32 datatype only.
|
||||
|
||||
!!! warning
|
||||
There are no pre-built wheels or images for this device, so you must build vLLM from source.
|
||||
|
||||
# --8<-- [end:installation]
|
||||
# --8<-- [start:requirements]
|
||||
|
||||
- OS: `Linux`
|
||||
- SDK: `gcc/g++ >= 12.3.0` or later with Command Line Tools
|
||||
- Instruction Set Architecture (ISA): VXE support is required. Works with Z14 and above.
|
||||
- Build install python packages: `pyarrow`, `torch` and `torchvision`
|
||||
|
||||
# --8<-- [end:requirements]
|
||||
# --8<-- [start:set-up-using-python]
|
||||
|
||||
# --8<-- [end:set-up-using-python]
|
||||
# --8<-- [start:pre-built-wheels]
|
||||
|
||||
# --8<-- [end:pre-built-wheels]
|
||||
# --8<-- [start:build-wheel-from-source]
|
||||
|
||||
Install the following packages from the package manager before building the vLLM. For example on RHEL 9.4:
|
||||
|
||||
```console
|
||||
dnf install -y \
|
||||
which procps findutils tar vim git gcc g++ make patch make cython zlib-devel \
|
||||
libjpeg-turbo-devel libtiff-devel libpng-devel libwebp-devel freetype-devel harfbuzz-devel \
|
||||
openssl-devel openblas openblas-devel wget autoconf automake libtool cmake numactl-devel
|
||||
```
|
||||
|
||||
Install rust>=1.80 which is needed for `outlines-core` and `uvloop` python packages installation.
|
||||
|
||||
```console
|
||||
curl https://sh.rustup.rs -sSf | sh -s -- -y && \
|
||||
. "$HOME/.cargo/env"
|
||||
```
|
||||
|
||||
Execute the following commands to build and install vLLM from the source.
|
||||
|
||||
!!! tip
|
||||
Please build the following dependencies, `torchvision`, `pyarrow` from the source before building vLLM.
|
||||
|
||||
```console
|
||||
sed -i '/^torch/d' requirements-build.txt # remove torch from requirements-build.txt since we use nightly builds
|
||||
pip install -v \
|
||||
--extra-index-url https://download.pytorch.org/whl/nightly/cpu \
|
||||
-r requirements-build.txt \
|
||||
-r requirements-cpu.txt \
|
||||
VLLM_TARGET_DEVICE=cpu python setup.py bdist_wheel && \
|
||||
pip install dist/*.whl
|
||||
```
|
||||
|
||||
# --8<-- [end:build-wheel-from-source]
|
||||
# --8<-- [start:set-up-using-docker]
|
||||
|
||||
# --8<-- [end:set-up-using-docker]
|
||||
# --8<-- [start:pre-built-images]
|
||||
|
||||
# --8<-- [end:pre-built-images]
|
||||
# --8<-- [start:build-image-from-source]
|
||||
|
||||
# --8<-- [end:build-image-from-source]
|
||||
# --8<-- [start:extra-information]
|
||||
# --8<-- [end:extra-information]
|
||||
46
docs/getting_started/installation/cpu/x86.inc.md
Normal file
46
docs/getting_started/installation/cpu/x86.inc.md
Normal file
@@ -0,0 +1,46 @@
|
||||
# --8<-- [start:installation]
|
||||
|
||||
vLLM initially supports basic model inferencing and serving on x86 CPU platform, with data types FP32, FP16 and BF16.
|
||||
|
||||
!!! warning
|
||||
There are no pre-built wheels or images for this device, so you must build vLLM from source.
|
||||
|
||||
# --8<-- [end:installation]
|
||||
# --8<-- [start:requirements]
|
||||
|
||||
- OS: Linux
|
||||
- Compiler: `gcc/g++ >= 12.3.0` (optional, recommended)
|
||||
- Instruction Set Architecture (ISA): AVX512 (optional, recommended)
|
||||
|
||||
!!! tip
|
||||
[Intel Extension for PyTorch (IPEX)](https://github.com/intel/intel-extension-for-pytorch) extends PyTorch with up-to-date features optimizations for an extra performance boost on Intel hardware.
|
||||
|
||||
# --8<-- [end:requirements]
|
||||
# --8<-- [start:set-up-using-python]
|
||||
|
||||
# --8<-- [end:set-up-using-python]
|
||||
# --8<-- [start:pre-built-wheels]
|
||||
|
||||
# --8<-- [end:pre-built-wheels]
|
||||
# --8<-- [start:build-wheel-from-source]
|
||||
|
||||
--8<-- "docs/getting_started/installation/cpu/cpu/build.inc.md"
|
||||
|
||||
!!! note
|
||||
- AVX512_BF16 is an extension ISA provides native BF16 data type conversion and vector product instructions, which brings some performance improvement compared with pure AVX512. The CPU backend build script will check the host CPU flags to determine whether to enable AVX512_BF16.
|
||||
- If you want to force enable AVX512_BF16 for the cross-compilation, please set environment variable `VLLM_CPU_AVX512BF16=1` before the building.
|
||||
|
||||
# --8<-- [end:build-wheel-from-source]
|
||||
# --8<-- [start:set-up-using-docker]
|
||||
|
||||
# --8<-- [end:set-up-using-docker]
|
||||
# --8<-- [start:pre-built-images]
|
||||
|
||||
See [https://gallery.ecr.aws/q9t5s3a7/vllm-cpu-release-repo](https://gallery.ecr.aws/q9t5s3a7/vllm-cpu-release-repo)
|
||||
|
||||
# --8<-- [end:pre-built-images]
|
||||
# --8<-- [start:build-image-from-source]
|
||||
|
||||
# --8<-- [end:build-image-from-source]
|
||||
# --8<-- [start:extra-information]
|
||||
# --8<-- [end:extra-information]
|
||||
Reference in New Issue
Block a user