2025-05-23 11:09:53 +02:00
# --8<-- [start:installation]
2025-07-10 01:28:30 +08:00
vLLM supports basic model inferencing and serving on x86 CPU platform, with data types FP32, FP16 and BF16.
2025-05-23 11:09:53 +02:00
# --8<-- [end:installation]
# --8<-- [start:requirements]
- OS: Linux
2025-08-12 12:51:08 +05:30
- CPU flags: `avx512f` (Recommended), `avx512_bf16` (Optional), `avx512_vnni` (Optional)
2025-05-23 11:09:53 +02:00
!!! tip
2025-07-10 01:28:30 +08:00
Use `lscpu` to check the CPU flags.
2025-05-23 11:09:53 +02:00
# --8<-- [end:requirements]
# --8<-- [start:set-up-using-python]
# --8<-- [end:set-up-using-python]
# --8<-- [start:pre-built-wheels]
# --8<-- [end:pre-built-wheels]
# --8<-- [start:build-wheel-from-source]
2025-06-24 12:04:11 +08:00
--8<-- "docs/getting_started/installation/cpu/build.inc.md"
2025-05-23 11:09:53 +02:00
# --8<-- [end:build-wheel-from-source]
# --8<-- [start:pre-built-images]
2025-07-10 01:28:30 +08:00
[https://gallery.ecr.aws/q9t5s3a7/vllm-cpu-release-repo ](https://gallery.ecr.aws/q9t5s3a7/vllm-cpu-release-repo )
!!! warning
2025-08-12 12:51:08 +05:30
If deploying the pre-built images on machines without `avx512f` , `avx512_bf16` , or `avx512_vnni` support, an `Illegal instruction` error may be raised. It is recommended to build images for these machines with the appropriate build arguments (e.g., `--build-arg VLLM_CPU_DISABLE_AVX512=true` , `--build-arg VLLM_CPU_AVX512BF16=false` , or `--build-arg VLLM_CPU_AVX512VNNI=false` ) to disable unsupported features. Please note that without `avx512f` , AVX2 will be used and this version is not recommended because it only has basic feature support.
2025-05-23 11:09:53 +02:00
# --8<-- [end:pre-built-images]
# --8<-- [start:build-image-from-source]
2025-07-10 01:28:30 +08:00
```bash
docker build -f docker/Dockerfile.cpu \
--build-arg VLLM_CPU_AVX512BF16=false (default)|true \
--build-arg VLLM_CPU_AVX512VNNI=false (default)|true \
2025-08-12 12:51:08 +05:30
--build-arg VLLM_CPU_DISABLE_AVX512=false (default)|true \
2025-07-10 01:28:30 +08:00
--tag vllm-cpu-env \
--target vllm-openai .
# Launching OpenAI server
docker run --rm \
--privileged=true \
--shm-size=4g \
-p 8000:8000 \
-e VLLM_CPU_KVCACHE_SPACE=<KV cache space> \
-e VLLM_CPU_OMP_THREADS_BIND=<CPU cores for inference> \
vllm-cpu-env \
--model=meta-llama/Llama-3.2-1B-Instruct \
--dtype=bfloat16 \
other vLLM OpenAI server arguments
```
2025-05-23 11:09:53 +02:00
# --8<-- [end:build-image-from-source]
# --8<-- [start:extra-information]
# --8<-- [end:extra-information]