docs/getting_started/installation/cpu/x86.inc.md

# --8<-- [start:installation]

vLLM supports basic model inferencing and serving on x86 CPU platform, with data types FP32, FP16 and BF16.

# --8<-- [end:installation]
# --8<-- [start:requirements]

- OS: Linux
- CPU flags: `avx512f` (Recommended), `avx512_bf16` (Optional), `avx512_vnni` (Optional)

!!! tip
    Use `lscpu` to check the CPU flags.

# --8<-- [end:requirements]
# --8<-- [start:set-up-using-python]

# --8<-- [end:set-up-using-python]
# --8<-- [start:pre-built-wheels]

# --8<-- [end:pre-built-wheels]
# --8<-- [start:build-wheel-from-source]

--8<-- "docs/getting_started/installation/cpu/build.inc.md"

# --8<-- [end:build-wheel-from-source]
# --8<-- [start:pre-built-images]

[https://gallery.ecr.aws/q9t5s3a7/vllm-cpu-release-repo](https://gallery.ecr.aws/q9t5s3a7/vllm-cpu-release-repo)

!!! warning
    If deploying the pre-built images on machines without `avx512f`, `avx512_bf16`, or `avx512_vnni` support, an `Illegal instruction` error may be raised. It is recommended to build images for these machines with the appropriate build arguments (e.g., `--build-arg VLLM_CPU_DISABLE_AVX512=true`, `--build-arg VLLM_CPU_AVX512BF16=false`, or `--build-arg VLLM_CPU_AVX512VNNI=false`) to disable unsupported features. Please note that without `avx512f`, AVX2 will be used and this version is not recommended because it only has basic feature support.

# --8<-- [end:pre-built-images]
# --8<-- [start:build-image-from-source]

```bash
docker build -f docker/Dockerfile.cpu \
        --build-arg VLLM_CPU_AVX512BF16=false (default)|true \
        --build-arg VLLM_CPU_AVX512VNNI=false (default)|true \
        --build-arg VLLM_CPU_DISABLE_AVX512=false (default)|true \ 
        --tag vllm-cpu-env \
        --target vllm-openai .

# Launching OpenAI server
docker run --rm \
            --security-opt seccomp=unconfined \
            --cap-add SYS_NICE \
            --shm-size=4g \
            -p 8000:8000 \
            -e VLLM_CPU_KVCACHE_SPACE=<KV cache space> \
            -e VLLM_CPU_OMP_THREADS_BIND=<CPU cores for inference> \
            vllm-cpu-env \
            --model=meta-llama/Llama-3.2-1B-Instruct \
            --dtype=bfloat16 \
            other vLLM OpenAI server arguments
```

# --8<-- [end:build-image-from-source]
# --8<-- [start:extra-information]
# --8<-- [end:extra-information]
Migrate docs from Sphinx to MkDocs (#18145) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> 2025-05-23 11:09:53 +02:00			`# --8<-- [start:installation]`

[Doc] Update CPU doc (#20676) Signed-off-by: jiang1.li <jiang1.li@intel.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> 2025-07-10 01:28:30 +08:00			`vLLM supports basic model inferencing and serving on x86 CPU platform, with data types FP32, FP16 and BF16.`
Migrate docs from Sphinx to MkDocs (#18145) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> 2025-05-23 11:09:53 +02:00
			`# --8<-- [end:installation]`
			`# --8<-- [start:requirements]`

			`- OS: Linux`
[doc] Update x86 CPU-inference installation doc to reflect optionality of AVX512f (#22707) Signed-off-by: Sooraj S <94284954+sooraj-satheesh@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Li, Jiang <bigpyj64@gmail.com> 2025-08-12 12:51:08 +05:30			- CPU flags: `avx512f` (Recommended), `avx512_bf16` (Optional), `avx512_vnni` (Optional)
Migrate docs from Sphinx to MkDocs (#18145) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> 2025-05-23 11:09:53 +02:00
			`!!! tip`
[Doc] Update CPU doc (#20676) Signed-off-by: jiang1.li <jiang1.li@intel.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> 2025-07-10 01:28:30 +08:00			Use `lscpu` to check the CPU flags.
Migrate docs from Sphinx to MkDocs (#18145) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> 2025-05-23 11:09:53 +02:00
			`# --8<-- [end:requirements]`
			`# --8<-- [start:set-up-using-python]`

			`# --8<-- [end:set-up-using-python]`
			`# --8<-- [start:pre-built-wheels]`

			`# --8<-- [end:pre-built-wheels]`
			`# --8<-- [start:build-wheel-from-source]`

[doc] Fix broken link in the installation for CPU (#19980) Signed-off-by: Kay Yan <kay.yan@daocloud.io> 2025-06-24 12:04:11 +08:00			`--8<-- "docs/getting_started/installation/cpu/build.inc.md"`
Migrate docs from Sphinx to MkDocs (#18145) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> 2025-05-23 11:09:53 +02:00
			`# --8<-- [end:build-wheel-from-source]`
			`# --8<-- [start:pre-built-images]`

[Doc] Update CPU doc (#20676) Signed-off-by: jiang1.li <jiang1.li@intel.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> 2025-07-10 01:28:30 +08:00			`[https://gallery.ecr.aws/q9t5s3a7/vllm-cpu-release-repo](https://gallery.ecr.aws/q9t5s3a7/vllm-cpu-release-repo)`

			`!!! warning`
[doc] Update x86 CPU-inference installation doc to reflect optionality of AVX512f (#22707) Signed-off-by: Sooraj S <94284954+sooraj-satheesh@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Li, Jiang <bigpyj64@gmail.com> 2025-08-12 12:51:08 +05:30			If deploying the pre-built images on machines without `avx512f`, `avx512_bf16`, or `avx512_vnni` support, an `Illegal instruction` error may be raised. It is recommended to build images for these machines with the appropriate build arguments (e.g., `--build-arg VLLM_CPU_DISABLE_AVX512=true`, `--build-arg VLLM_CPU_AVX512BF16=false`, or `--build-arg VLLM_CPU_AVX512VNNI=false`) to disable unsupported features. Please note that without `avx512f`, AVX2 will be used and this version is not recommended because it only has basic feature support.
Migrate docs from Sphinx to MkDocs (#18145) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> 2025-05-23 11:09:53 +02:00
			`# --8<-- [end:pre-built-images]`
			`# --8<-- [start:build-image-from-source]`

[Doc] Update CPU doc (#20676) Signed-off-by: jiang1.li <jiang1.li@intel.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> 2025-07-10 01:28:30 +08:00			```bash
			`docker build -f docker/Dockerfile.cpu \`
			`--build-arg VLLM_CPU_AVX512BF16=false (default)\|true \`
			`--build-arg VLLM_CPU_AVX512VNNI=false (default)\|true \`
[doc] Update x86 CPU-inference installation doc to reflect optionality of AVX512f (#22707) Signed-off-by: Sooraj S <94284954+sooraj-satheesh@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Li, Jiang <bigpyj64@gmail.com> 2025-08-12 12:51:08 +05:30			`--build-arg VLLM_CPU_DISABLE_AVX512=false (default)\|true \`
[Doc] Update CPU doc (#20676) Signed-off-by: jiang1.li <jiang1.li@intel.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> 2025-07-10 01:28:30 +08:00			`--tag vllm-cpu-env \`
			`--target vllm-openai .`

			`# Launching OpenAI server`
			`docker run --rm \`
[CPU] Enable data parallel for CPU backend (#23903) Signed-off-by: jiang1.li <jiang1.li@intel.com> 2025-08-29 17:19:58 +08:00			`--security-opt seccomp=unconfined \`
[docs] add SYS_NICE cap & `security-opt` for docker/k8s (#24017) Signed-off-by: Peter Pan <Peter.Pan@daocloud.io> Signed-off-by: Peter Pan <peter.pan@daocloud.io> Co-authored-by: Li, Jiang <bigpyj64@gmail.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> 2025-09-03 01:27:20 +08:00			`--cap-add SYS_NICE \`
[Doc] Update CPU doc (#20676) Signed-off-by: jiang1.li <jiang1.li@intel.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> 2025-07-10 01:28:30 +08:00			`--shm-size=4g \`
			`-p 8000:8000 \`
			`-e VLLM_CPU_KVCACHE_SPACE=<KV cache space> \`
			`-e VLLM_CPU_OMP_THREADS_BIND=<CPU cores for inference> \`
			`vllm-cpu-env \`
			`--model=meta-llama/Llama-3.2-1B-Instruct \`
			`--dtype=bfloat16 \`
			`other vLLM OpenAI server arguments`
			```

Migrate docs from Sphinx to MkDocs (#18145) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> 2025-05-23 11:09:53 +02:00			`# --8<-- [end:build-image-from-source]`
			`# --8<-- [start:extra-information]`
			`# --8<-- [end:extra-information]`