2026-03-09 03:05:24 +00:00
<!-- markdownlint-disable MD041 -->
--8<-- [start:installation]
2025-05-23 11:09:53 +02:00
2025-07-10 01:28:30 +08:00
vLLM supports basic model inferencing and serving on x86 CPU platform, with data types FP32, FP16 and BF16.
2025-05-23 11:09:53 +02:00
2026-03-09 03:05:24 +00:00
--8<-- [end:installation]
--8<-- [start:requirements]
2025-05-23 11:09:53 +02:00
- OS: Linux
2026-03-14 09:27:29 +08:00
- CPU flags: `avx512f` (Recommended), `avx2` (Limited features)
2025-05-23 11:09:53 +02:00
!!! tip
2025-07-10 01:28:30 +08:00
Use `lscpu` to check the CPU flags.
2025-05-23 11:09:53 +02:00
2026-03-09 03:05:24 +00:00
--8<-- [end:requirements]
--8<-- [start:set-up-using-python]
2025-05-23 11:09:53 +02:00
2026-03-09 03:05:24 +00:00
--8<-- [end:set-up-using-python]
--8<-- [start:pre-built-wheels]
2025-05-23 11:09:53 +02:00
2026-03-14 09:27:29 +08:00
Pre-built vLLM wheels for x86 with AVX512/AVX2 are available since version 0.17.0. To install release wheels:
2025-12-18 12:59:09 +08:00
```bash
export VLLM_VERSION=$(curl -s https://api.github.com/repos/vllm-project/vllm/releases/latest | jq -r .tag_name | sed 's/^v//')
# use uv
2025-12-19 15:29:52 +08:00
uv pip install https://github.com/vllm-project/vllm/releases/download/v${VLLM_VERSION}/vllm-${VLLM_VERSION}+cpu-cp38-abi3-manylinux_2_35_x86_64.whl --torch-backend cpu
2025-12-18 12:59:09 +08:00
```
2026-03-09 03:05:24 +00:00
2025-12-18 12:59:09 +08:00
??? console "pip"
```bash
# use pip
2025-12-19 15:29:52 +08:00
pip install https://github.com/vllm-project/vllm/releases/download/v${VLLM_VERSION}/vllm-${VLLM_VERSION}+cpu-cp38-abi3-manylinux_2_35_x86_64.whl --extra-index-url https://download.pytorch.org/whl/cpu
2025-12-18 12:59:09 +08:00
```
!!! warning "set `LD_PRELOAD` "
Before use vLLM CPU installed via wheels, make sure TCMalloc and Intel OpenMP are installed and added to `LD_PRELOAD` :
```bash
# install TCMalloc, Intel OpenMP is installed with vLLM CPU
sudo apt-get install -y --no-install-recommends libtcmalloc-minimal4
# manually find the path
sudo find / -iname *libtcmalloc_minimal.so.4
sudo find / -iname *libiomp5.so
TC_PATH=...
IOMP_PATH=...
# add them to LD_PRELOAD
export LD_PRELOAD="$TC_PATH:$IOMP_PATH:$LD_PRELOAD"
```
2026-03-09 03:05:24 +00:00
#### Install the latest code
2025-12-18 12:59:09 +08:00
To install the wheel built from the latest main branch:
```bash
uv pip install vllm --extra-index-url https://wheels.vllm.ai/nightly/cpu --index-strategy first-index --torch-backend cpu
```
2026-03-09 03:05:24 +00:00
#### Install specific revisions
2025-12-18 12:59:09 +08:00
If you want to access the wheels for previous commits (e.g. to bisect the behavior change, performance regression), you can specify the commit hash in the URL:
```bash
export VLLM_COMMIT=730bd35378bf2a5b56b6d3a45be28b3092d26519 # use full commit hash from the main branch
uv pip install vllm --extra-index-url https://wheels.vllm.ai/${VLLM_COMMIT}/cpu --index-strategy first-index --torch-backend cpu
```
2025-12-03 14:33:50 +01:00
2026-03-09 03:05:24 +00:00
--8<-- [end:pre-built-wheels]
--8<-- [start:build-wheel-from-source]
2025-05-23 11:09:53 +02:00
2025-09-26 13:26:33 -04:00
Install recommended compiler. We recommend to use `gcc/g++ >= 12.3.0` as the default compiler to avoid potential problems. For example, on Ubuntu 22.4, you can run:
```bash
sudo apt-get update -y
2025-12-18 12:59:09 +08:00
sudo apt-get install -y gcc-12 g++-12 libnuma-dev
2025-09-26 13:26:33 -04:00
sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-12 10 --slave /usr/bin/g++ g++ /usr/bin/g++-12
```
2025-12-18 12:59:09 +08:00
--8<-- "docs/getting_started/installation/python_env_setup.inc.md"
2025-09-26 13:26:33 -04:00
Clone the vLLM project:
```bash
git clone https://github.com/vllm-project/vllm.git vllm_source
cd vllm_source
```
Install the required dependencies:
```bash
uv pip install -r requirements/cpu-build.txt --torch-backend cpu
uv pip install -r requirements/cpu.txt --torch-backend cpu
```
??? console "pip"
```bash
pip install --upgrade pip
pip install -v -r requirements/cpu-build.txt --extra-index-url https://download.pytorch.org/whl/cpu
pip install -v -r requirements/cpu.txt --extra-index-url https://download.pytorch.org/whl/cpu
```
Build and install vLLM:
```bash
VLLM_TARGET_DEVICE=cpu uv pip install . --no-build-isolation
```
If you want to develop vLLM, install it in editable mode instead.
```bash
2026-03-14 09:27:29 +08:00
VLLM_TARGET_DEVICE=cpu python3 setup.py develop
2025-09-26 13:26:33 -04:00
```
Optionally, build a portable wheel which you can then install elsewhere:
```bash
2026-03-14 09:27:29 +08:00
VLLM_TARGET_DEVICE=cpu uv build --wheel --no-build-isolation
2025-09-26 13:26:33 -04:00
```
```bash
uv pip install dist/*.whl
```
??? console "pip"
```bash
VLLM_TARGET_DEVICE=cpu python -m build --wheel --no-isolation
```
```bash
pip install dist/*.whl
```
2025-12-18 12:59:09 +08:00
!!! warning "set `LD_PRELOAD` "
Before use vLLM CPU installed via wheels, make sure TCMalloc and Intel OpenMP are installed and added to `LD_PRELOAD` :
```bash
# install TCMalloc, Intel OpenMP is installed with vLLM CPU
sudo apt-get install -y --no-install-recommends libtcmalloc-minimal4
# manually find the path
sudo find / -iname *libtcmalloc_minimal.so.4
sudo find / -iname *libiomp5.so
TC_PATH=...
IOMP_PATH=...
# add them to LD_PRELOAD
export LD_PRELOAD="$TC_PATH:$IOMP_PATH:$LD_PRELOAD"
```
2025-09-26 13:26:33 -04:00
!!! example "Troubleshooting"
- **NumPy ≥2.0 error**: Downgrade using `pip install "numpy<2.0"` .
- **CMake picks up CUDA**: Add `CMAKE_DISABLE_FIND_PACKAGE_CUDA=ON` to prevent CUDA detection during CPU builds, even if CUDA is installed.
2025-11-16 15:30:06 +01:00
- `AMD` requires at least 4th gen processors (Zen 4/Genoa) or higher to support [AVX512 ](https://www.phoronix.com/review/amd-zen4-avx512 ) to run vLLM on CPU.
2025-09-26 13:26:33 -04:00
- If you receive an error such as: `Could not find a version that satisfies the requirement torch==X.Y.Z+cpu+cpu` , consider updating [pyproject.toml ](https://github.com/vllm-project/vllm/blob/main/pyproject.toml ) to help pip resolve the dependency.
```toml title="pyproject.toml"
[build-system]
requires = [
"cmake>=3.26.1",
...
"torch==X.Y.Z+cpu" # <-------
]
```
2025-05-23 11:09:53 +02:00
2026-03-09 03:05:24 +00:00
--8<-- [end:build-wheel-from-source]
--8<-- [start:pre-built-images]
2025-05-23 11:09:53 +02:00
2026-03-04 19:31:35 +01:00
You can pull the latest available CPU image from Docker Hub:
2026-01-30 08:36:20 -05:00
```bash
2026-03-04 19:31:35 +01:00
docker pull vllm/vllm-openai-cpu:latest-x86_64
2026-01-30 08:36:20 -05:00
```
2026-03-04 19:31:35 +01:00
To pull an image for a specific vLLM version:
```bash
export VLLM_VERSION=$(curl -s https://api.github.com/repos/vllm-project/vllm/releases/latest | jq -r .tag_name | sed 's/^v//')
docker pull vllm/vllm-openai-cpu:v${VLLM_VERSION}-x86_64
```
All available image tags are here: [https://hub.docker.com/r/vllm/vllm-openai-cpu/tags ](https://hub.docker.com/r/vllm/vllm-openai-cpu/tags )
2026-01-30 08:36:20 -05:00
You can run these images via:
```bash
docker run \
-v ~/.cache/huggingface:/root/.cache/huggingface \
-p 8000:8000 \
--env "HF_TOKEN=<secret>" \
2026-03-14 09:27:29 +08:00
vllm/vllm-openai-cpu:latest-x86_64 <args...>
2026-01-30 08:36:20 -05:00
```
2025-07-10 01:28:30 +08:00
2026-03-09 03:05:24 +00:00
--8<-- [end:pre-built-images]
--8<-- [start:build-image-from-source]
2025-05-23 11:09:53 +02:00
2026-03-09 03:05:24 +00:00
#### Building for your target CPU
2026-01-24 17:08:24 +00:00
```bash
docker build -f docker/Dockerfile.cpu \
2026-03-14 09:27:29 +08:00
--build-arg VLLM_CPU_X86=<false (default)|true> \ # For cross-compilation
2026-01-24 17:08:24 +00:00
--tag vllm-cpu-env \
--target vllm-openai .
```
2026-03-09 03:05:24 +00:00
#### Launching the OpenAI server
2026-01-24 17:08:24 +00:00
```bash
2025-07-10 01:28:30 +08:00
docker run --rm \
2025-08-29 17:19:58 +08:00
--security-opt seccomp=unconfined \
2025-09-03 01:27:20 +08:00
--cap-add SYS_NICE \
2025-07-10 01:28:30 +08:00
--shm-size=4g \
-p 8000:8000 \
-e VLLM_CPU_KVCACHE_SPACE=<KV cache space> \
vllm-cpu-env \
2025-12-18 12:59:09 +08:00
meta-llama/Llama-3.2-1B-Instruct \
2025-07-10 01:28:30 +08:00
--dtype=bfloat16 \
other vLLM OpenAI server arguments
```
2026-03-09 03:05:24 +00:00
--8<-- [end:build-image-from-source]
--8<-- [start:extra-information]
--8<-- [end:extra-information]