[CI/Build] Add markdown linter (#11857)
Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com>
This commit is contained in:
@@ -8,7 +8,7 @@ Due to the auto-regressive nature of transformer architecture, there are times w
|
||||
The vLLM can preempt requests to free up KV cache space for other requests. Preempted requests are recomputed when sufficient KV cache space becomes
|
||||
available again. When this occurs, the following warning is printed:
|
||||
|
||||
```
|
||||
```text
|
||||
WARNING 05-09 00:49:33 scheduler.py:1057 Sequence group 0 is preempted by PreemptionMode.SWAP mode because there is not enough KV cache space. This can affect the end-to-end performance. Increase gpu_memory_utilization or tensor_parallel_size to provide more KV cache memory. total_cumulative_preemption_cnt=1
|
||||
```
|
||||
|
||||
|
||||
Reference in New Issue
Block a user