Allow markdownlint to run locally (#36398)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
This commit is contained in:
@@ -1,19 +1,20 @@
|
||||
# --8<-- [start:installation]
|
||||
<!-- markdownlint-disable MD041 -->
|
||||
--8<-- [start:installation]
|
||||
|
||||
vLLM offers basic model inferencing and serving on Arm CPU platform, with support for NEON, data types FP32, FP16 and BF16.
|
||||
|
||||
# --8<-- [end:installation]
|
||||
# --8<-- [start:requirements]
|
||||
--8<-- [end:installation]
|
||||
--8<-- [start:requirements]
|
||||
|
||||
- OS: Linux
|
||||
- Compiler: `gcc/g++ >= 12.3.0` (optional, recommended)
|
||||
- Instruction Set Architecture (ISA): NEON support is required
|
||||
|
||||
# --8<-- [end:requirements]
|
||||
# --8<-- [start:set-up-using-python]
|
||||
--8<-- [end:requirements]
|
||||
--8<-- [start:set-up-using-python]
|
||||
|
||||
# --8<-- [end:set-up-using-python]
|
||||
# --8<-- [start:pre-built-wheels]
|
||||
--8<-- [end:set-up-using-python]
|
||||
--8<-- [start:pre-built-wheels]
|
||||
|
||||
Pre-built vLLM wheels for Arm are available since version 0.11.2. These wheels contain pre-compiled C++ binaries.
|
||||
|
||||
@@ -43,13 +44,14 @@ uv pip install https://github.com/vllm-project/vllm/releases/download/v${VLLM_VE
|
||||
|
||||
The `uv` approach works for vLLM `v0.6.6` and later. A unique feature of `uv` is that packages in `--extra-index-url` have [higher priority than the default index](https://docs.astral.sh/uv/pip/compatibility/#packages-that-exist-on-multiple-indexes). If the latest public release is `v0.6.6.post1`, `uv`'s behavior allows installing a commit before `v0.6.6.post1` by specifying the `--extra-index-url`. In contrast, `pip` combines packages from `--extra-index-url` and the default index, choosing only the latest version, which makes it difficult to install a development version prior to the released version.
|
||||
|
||||
**Install the latest code**
|
||||
#### Install the latest code
|
||||
|
||||
LLM inference is a fast-evolving field, and the latest code may contain bug fixes, performance improvements, and new features that are not released yet. To allow users to try the latest code without waiting for the next release, vLLM provides working pre-built Arm CPU wheels for every commit since `v0.11.2` on <https://wheels.vllm.ai/nightly>. For native CPU wheels, this index should be used:
|
||||
|
||||
* `https://wheels.vllm.ai/nightly/cpu/vllm`
|
||||
- `https://wheels.vllm.ai/nightly/cpu/vllm`
|
||||
|
||||
To install from nightly index, run:
|
||||
|
||||
```bash
|
||||
uv pip install vllm --extra-index-url https://wheels.vllm.ai/nightly/cpu --index-strategy first-index
|
||||
```
|
||||
@@ -64,7 +66,7 @@ uv pip install vllm --extra-index-url https://wheels.vllm.ai/nightly/cpu --index
|
||||
pip install https://wheels.vllm.ai/4fa7ce46f31cbd97b4651694caf9991cc395a259/vllm-0.13.0rc2.dev104%2Bg4fa7ce46f.cpu-cp38-abi3-manylinux_2_35_aarch64.whl # current nightly build (the filename will change!)
|
||||
```
|
||||
|
||||
**Install specific revisions**
|
||||
#### Install specific revisions
|
||||
|
||||
If you want to access the wheels for previous commits (e.g. to bisect the behavior change, performance regression), you can specify the commit hash in the URL:
|
||||
|
||||
@@ -73,8 +75,8 @@ export VLLM_COMMIT=730bd35378bf2a5b56b6d3a45be28b3092d26519 # use full commit ha
|
||||
uv pip install vllm --extra-index-url https://wheels.vllm.ai/${VLLM_COMMIT}/cpu --index-strategy first-index
|
||||
```
|
||||
|
||||
# --8<-- [end:pre-built-wheels]
|
||||
# --8<-- [start:build-wheel-from-source]
|
||||
--8<-- [end:pre-built-wheels]
|
||||
--8<-- [start:build-wheel-from-source]
|
||||
|
||||
First, install the recommended compiler. We recommend using `gcc/g++ >= 12.3.0` as the default compiler to avoid potential problems. For example, on Ubuntu 22.4, you can run:
|
||||
|
||||
@@ -133,8 +135,8 @@ Testing has been conducted on AWS Graviton3 instances for compatibility.
|
||||
export LD_PRELOAD="$TC_PATH:$LD_PRELOAD"
|
||||
```
|
||||
|
||||
# --8<-- [end:build-wheel-from-source]
|
||||
# --8<-- [start:pre-built-images]
|
||||
--8<-- [end:build-wheel-from-source]
|
||||
--8<-- [start:pre-built-images]
|
||||
|
||||
To pull the latest image from Docker Hub:
|
||||
|
||||
@@ -170,10 +172,10 @@ export VLLM_COMMIT=6299628d326f429eba78736acb44e76749b281f5 # use full commit ha
|
||||
docker pull public.ecr.aws/q9t5s3a7/vllm-ci-postmerge-repo:${VLLM_COMMIT}-arm64-cpu
|
||||
```
|
||||
|
||||
# --8<-- [end:pre-built-images]
|
||||
# --8<-- [start:build-image-from-source]
|
||||
--8<-- [end:pre-built-images]
|
||||
--8<-- [start:build-image-from-source]
|
||||
|
||||
## Building for your target ARM CPU
|
||||
#### Building for your target ARM CPU
|
||||
|
||||
```bash
|
||||
docker build -f docker/Dockerfile.cpu \
|
||||
@@ -189,9 +191,9 @@ docker build -f docker/Dockerfile.cpu \
|
||||
- `VLLM_CPU_ARM_BF16=true` - Force-enable ARM BF16 support (build with BF16 regardless of build system capabilities)
|
||||
- `VLLM_CPU_ARM_BF16=false` - Rely on auto-detection (default)
|
||||
|
||||
### Examples
|
||||
##### Examples
|
||||
|
||||
**Auto-detection build (native ARM)**
|
||||
###### Auto-detection build (native ARM)
|
||||
|
||||
```bash
|
||||
# Building on ARM64 system - platform auto-detected
|
||||
@@ -200,7 +202,7 @@ docker build -f docker/Dockerfile.cpu \
|
||||
--target vllm-openai .
|
||||
```
|
||||
|
||||
**Cross-compile for ARM with BF16 support**
|
||||
###### Cross-compile for ARM with BF16 support
|
||||
|
||||
```bash
|
||||
# Building on ARM64 for newer ARM CPUs with BF16
|
||||
@@ -210,7 +212,7 @@ docker build -f docker/Dockerfile.cpu \
|
||||
--target vllm-openai .
|
||||
```
|
||||
|
||||
**Cross-compile from x86_64 to ARM64 with BF16**
|
||||
###### Cross-compile from x86_64 to ARM64 with BF16
|
||||
|
||||
```bash
|
||||
# Requires Docker buildx with ARM emulation (QEMU)
|
||||
@@ -226,7 +228,7 @@ docker buildx build -f docker/Dockerfile.cpu \
|
||||
!!! note "ARM BF16 requirements"
|
||||
ARM BF16 support requires ARMv8.6-A or later (FEAT_BF16). Supported on AWS Graviton3/4, AmpereOne, and other recent ARM processors.
|
||||
|
||||
## Launching the OpenAI server
|
||||
#### Launching the OpenAI server
|
||||
|
||||
```bash
|
||||
docker run --rm \
|
||||
@@ -245,6 +247,6 @@ docker run --rm \
|
||||
!!! tip "Alternative to --privileged"
|
||||
Instead of `--privileged=true`, use `--cap-add SYS_NICE --security-opt seccomp=unconfined` for better security.
|
||||
|
||||
# --8<-- [end:build-image-from-source]
|
||||
# --8<-- [start:extra-information]
|
||||
# --8<-- [end:extra-information]
|
||||
--8<-- [end:build-image-from-source]
|
||||
--8<-- [start:extra-information]
|
||||
--8<-- [end:extra-information]
|
||||
|
||||
Reference in New Issue
Block a user