[doc] Fold long code blocks to improve readability (#19926)

Signed-off-by: reidliu41 <reid201711@gmail.com> Co-authored-by: reidliu41 <reid201711@gmail.com>
2025-06-23 13:24:23 +08:00
parent 493c275352
commit f17aec0d63
50 changed files with 3455 additions and 3180 deletions
--- a/docs/getting_started/installation/cpu.md
+++ b/docs/getting_started/installation/cpu.md
@@ -76,21 +76,23 @@ Currently, there are no pre-built CPU wheels.

 ### Build image from source

-```console
-$ docker build -f docker/Dockerfile.cpu --tag vllm-cpu-env --target vllm-openai .
+??? Commands

-# Launching OpenAI server 
-$ docker run --rm \
-             --privileged=true \
-             --shm-size=4g \
-             -p 8000:8000 \
-             -e VLLM_CPU_KVCACHE_SPACE=<KV cache space> \
-             -e VLLM_CPU_OMP_THREADS_BIND=<CPU cores for inference> \
-             vllm-cpu-env \
-             --model=meta-llama/Llama-3.2-1B-Instruct \
-             --dtype=bfloat16 \
-             other vLLM OpenAI server arguments
-```
+    ```console
+    $ docker build -f docker/Dockerfile.cpu --tag vllm-cpu-env --target vllm-openai .
+
+    # Launching OpenAI server 
+    $ docker run --rm \
+                --privileged=true \
+                --shm-size=4g \
+                -p 8000:8000 \
+                -e VLLM_CPU_KVCACHE_SPACE=<KV cache space> \
+                -e VLLM_CPU_OMP_THREADS_BIND=<CPU cores for inference> \
+                vllm-cpu-env \
+                --model=meta-llama/Llama-3.2-1B-Instruct \
+                --dtype=bfloat16 \
+                other vLLM OpenAI server arguments
+    ```

 !!! tip
    For ARM or Apple silicon, use `docker/Dockerfile.arm`
@@ -144,32 +146,34 @@ vllm serve facebook/opt-125m

 - If using vLLM CPU backend on a machine with hyper-threading, it is recommended to bind only one OpenMP thread on each physical CPU core using `VLLM_CPU_OMP_THREADS_BIND` or using auto thread binding feature by default. On a hyper-threading enabled platform with 16 logical CPU cores / 8 physical CPU cores:

-```console
-$ lscpu -e # check the mapping between logical CPU cores and physical CPU cores
+??? Commands

-# The "CPU" column means the logical CPU core IDs, and the "CORE" column means the physical core IDs. On this platform, two logical cores are sharing one physical core.
-CPU NODE SOCKET CORE L1d:L1i:L2:L3 ONLINE    MAXMHZ   MINMHZ      MHZ
-0    0      0    0 0:0:0:0          yes 2401.0000 800.0000  800.000
-1    0      0    1 1:1:1:0          yes 2401.0000 800.0000  800.000
-2    0      0    2 2:2:2:0          yes 2401.0000 800.0000  800.000
-3    0      0    3 3:3:3:0          yes 2401.0000 800.0000  800.000
-4    0      0    4 4:4:4:0          yes 2401.0000 800.0000  800.000
-5    0      0    5 5:5:5:0          yes 2401.0000 800.0000  800.000
-6    0      0    6 6:6:6:0          yes 2401.0000 800.0000  800.000
-7    0      0    7 7:7:7:0          yes 2401.0000 800.0000  800.000
-8    0      0    0 0:0:0:0          yes 2401.0000 800.0000  800.000
-9    0      0    1 1:1:1:0          yes 2401.0000 800.0000  800.000
-10   0      0    2 2:2:2:0          yes 2401.0000 800.0000  800.000
-11   0      0    3 3:3:3:0          yes 2401.0000 800.0000  800.000
-12   0      0    4 4:4:4:0          yes 2401.0000 800.0000  800.000
-13   0      0    5 5:5:5:0          yes 2401.0000 800.0000  800.000
-14   0      0    6 6:6:6:0          yes 2401.0000 800.0000  800.000
-15   0      0    7 7:7:7:0          yes 2401.0000 800.0000  800.000
+    ```console
+    $ lscpu -e # check the mapping between logical CPU cores and physical CPU cores

-# On this platform, it is recommend to only bind openMP threads on logical CPU cores 0-7 or 8-15
-$ export VLLM_CPU_OMP_THREADS_BIND=0-7
-$ python examples/offline_inference/basic/basic.py
-```
+    # The "CPU" column means the logical CPU core IDs, and the "CORE" column means the physical core IDs. On this platform, two logical cores are sharing one physical core.
+    CPU NODE SOCKET CORE L1d:L1i:L2:L3 ONLINE    MAXMHZ   MINMHZ      MHZ
+    0    0      0    0 0:0:0:0          yes 2401.0000 800.0000  800.000
+    1    0      0    1 1:1:1:0          yes 2401.0000 800.0000  800.000
+    2    0      0    2 2:2:2:0          yes 2401.0000 800.0000  800.000
+    3    0      0    3 3:3:3:0          yes 2401.0000 800.0000  800.000
+    4    0      0    4 4:4:4:0          yes 2401.0000 800.0000  800.000
+    5    0      0    5 5:5:5:0          yes 2401.0000 800.0000  800.000
+    6    0      0    6 6:6:6:0          yes 2401.0000 800.0000  800.000
+    7    0      0    7 7:7:7:0          yes 2401.0000 800.0000  800.000
+    8    0      0    0 0:0:0:0          yes 2401.0000 800.0000  800.000
+    9    0      0    1 1:1:1:0          yes 2401.0000 800.0000  800.000
+    10   0      0    2 2:2:2:0          yes 2401.0000 800.0000  800.000
+    11   0      0    3 3:3:3:0          yes 2401.0000 800.0000  800.000
+    12   0      0    4 4:4:4:0          yes 2401.0000 800.0000  800.000
+    13   0      0    5 5:5:5:0          yes 2401.0000 800.0000  800.000
+    14   0      0    6 6:6:6:0          yes 2401.0000 800.0000  800.000
+    15   0      0    7 7:7:7:0          yes 2401.0000 800.0000  800.000
+
+    # On this platform, it is recommend to only bind openMP threads on logical CPU cores 0-7 or 8-15
+    $ export VLLM_CPU_OMP_THREADS_BIND=0-7
+    $ python examples/offline_inference/basic/basic.py
+    ```

 - If using vLLM CPU backend on a multi-socket machine with NUMA, be aware to set CPU cores using `VLLM_CPU_OMP_THREADS_BIND` to avoid cross NUMA node memory access.