[Docs] Reduce custom syntax used in docs (#27009)

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-10-17 04:05:34 +01:00
parent 965c5f4914
commit 4ffd6e8942
65 changed files with 381 additions and 402 deletions
--- a/docs/getting_started/installation/cpu.arm.inc.md
+++ b/docs/getting_started/installation/cpu.arm.inc.md
@@ -0,0 +1,96 @@
+# --8<-- [start:installation]
+
+vLLM has been adapted to work on ARM64 CPUs with NEON support, leveraging the CPU backend initially developed for the x86 platform.
+
+ARM CPU backend currently supports Float32, FP16 and BFloat16 datatypes.
+
+!!! warning
+    There are no pre-built wheels or images for this device, so you must build vLLM from source.
+
+# --8<-- [end:installation]
+# --8<-- [start:requirements]
+
+- OS: Linux
+- Compiler: `gcc/g++ >= 12.3.0` (optional, recommended)
+- Instruction Set Architecture (ISA): NEON support is required
+
+# --8<-- [end:requirements]
+# --8<-- [start:set-up-using-python]
+
+# --8<-- [end:set-up-using-python]
+# --8<-- [start:pre-built-wheels]
+
+# --8<-- [end:pre-built-wheels]
+# --8<-- [start:build-wheel-from-source]
+
+First, install the recommended compiler. We recommend using `gcc/g++ >= 12.3.0` as the default compiler to avoid potential problems. For example, on Ubuntu 22.4, you can run:
+
+```bash
+sudo apt-get update  -y
+sudo apt-get install -y --no-install-recommends ccache git curl wget ca-certificates gcc-12 g++-12 libtcmalloc-minimal4 libnuma-dev ffmpeg libsm6 libxext6 libgl1 jq lsof
+sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-12 10 --slave /usr/bin/g++ g++ /usr/bin/g++-12
+```
+
+Second, clone the vLLM project:
+
+```bash
+git clone https://github.com/vllm-project/vllm.git vllm_source
+cd vllm_source
+```
+
+Third, install required dependencies:
+
+```bash
+uv pip install -r requirements/cpu-build.txt --torch-backend cpu
+uv pip install -r requirements/cpu.txt --torch-backend cpu
+```
+
+??? console "pip"
+    ```bash
+    pip install --upgrade pip
+    pip install -v -r requirements/cpu-build.txt --extra-index-url https://download.pytorch.org/whl/cpu
+    pip install -v -r requirements/cpu.txt --extra-index-url https://download.pytorch.org/whl/cpu
+    ```
+
+Finally, build and install vLLM:
+
+```bash
+VLLM_TARGET_DEVICE=cpu uv pip install . --no-build-isolation
+```
+
+If you want to develop vLLM, install it in editable mode instead.
+
+```bash
+VLLM_TARGET_DEVICE=cpu uv pip install -e . --no-build-isolation
+```
+
+Testing has been conducted on AWS Graviton3 instances for compatibility.
+
+# --8<-- [end:build-wheel-from-source]
+# --8<-- [start:pre-built-images]
+
+# --8<-- [end:pre-built-images]
+# --8<-- [start:build-image-from-source]
+```bash
+docker build -f docker/Dockerfile.cpu \
+        --tag vllm-cpu-env .
+
+# Launching OpenAI server
+docker run --rm \
+            --privileged=true \
+            --shm-size=4g \
+            -p 8000:8000 \
+            -e VLLM_CPU_KVCACHE_SPACE=<KV cache space> \
+            -e VLLM_CPU_OMP_THREADS_BIND=<CPU cores for inference> \
+            vllm-cpu-env \
+            --model=meta-llama/Llama-3.2-1B-Instruct \
+            --dtype=bfloat16 \
+            other vLLM OpenAI server arguments
+```
+
+!!! tip
+    An alternative of `--privileged=true` is `--cap-add SYS_NICE --security-opt seccomp=unconfined`.
+
+# --8<-- [end:build-image-from-source]
+# --8<-- [start:extra-information]
+# --8<-- [end:extra-information]