Make distinct code and console admonitions so readers are less likely to miss them (#20585)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
This commit is contained in:
@@ -76,7 +76,7 @@ Currently, there are no pre-built CPU wheels.
|
||||
|
||||
### Build image from source
|
||||
|
||||
??? Commands
|
||||
??? console "Commands"
|
||||
|
||||
```bash
|
||||
docker build -f docker/Dockerfile.cpu \
|
||||
@@ -149,7 +149,7 @@ vllm serve facebook/opt-125m
|
||||
|
||||
- If using vLLM CPU backend on a machine with hyper-threading, it is recommended to bind only one OpenMP thread on each physical CPU core using `VLLM_CPU_OMP_THREADS_BIND` or using auto thread binding feature by default. On a hyper-threading enabled platform with 16 logical CPU cores / 8 physical CPU cores:
|
||||
|
||||
??? Commands
|
||||
??? console "Commands"
|
||||
|
||||
```console
|
||||
$ lscpu -e # check the mapping between logical CPU cores and physical CPU cores
|
||||
|
||||
@@ -95,7 +95,7 @@ Currently, there are no pre-built ROCm wheels.
|
||||
|
||||
4. Build vLLM. For example, vLLM on ROCM 6.3 can be built with the following steps:
|
||||
|
||||
??? Commands
|
||||
??? console "Commands"
|
||||
|
||||
```bash
|
||||
pip install --upgrade pip
|
||||
@@ -206,7 +206,7 @@ DOCKER_BUILDKIT=1 docker build \
|
||||
|
||||
To run the above docker image `vllm-rocm`, use the below command:
|
||||
|
||||
??? Command
|
||||
??? console "Command"
|
||||
|
||||
```bash
|
||||
docker run -it \
|
||||
|
||||
@@ -237,7 +237,7 @@ As an example, if a request of 3 sequences, with max sequence length of 412 come
|
||||
|
||||
Warmup is an optional, but highly recommended step occurring before vLLM server starts listening. It executes a forward pass for each bucket with dummy data. The goal is to pre-compile all graphs and not incur any graph compilation overheads within bucket boundaries during server runtime. Each warmup step is logged during vLLM startup:
|
||||
|
||||
??? Logs
|
||||
??? console "Logs"
|
||||
|
||||
```text
|
||||
INFO 08-01 22:26:47 hpu_model_runner.py:1066] [Warmup][Prompt][1/24] batch_size:4 seq_len:1024 free_mem:79.16 GiB
|
||||
@@ -286,7 +286,7 @@ When there's large amount of requests pending, vLLM scheduler will attempt to fi
|
||||
|
||||
Each described step is logged by vLLM server, as follows (negative values correspond to memory being released):
|
||||
|
||||
??? Logs
|
||||
??? console "Logs"
|
||||
|
||||
```text
|
||||
INFO 08-02 17:37:44 hpu_model_runner.py:493] Prompt bucket config (min, step, max_warmup) bs:[1, 32, 4], seq:[128, 128, 1024]
|
||||
|
||||
Reference in New Issue
Block a user