[doc] format fix (#10789)

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
This commit is contained in:
wangxiyuan
2024-11-30 19:38:40 +08:00
committed by GitHub
parent e7cfc4ef4c
commit 7e4bbda573
2 changed files with 19 additions and 19 deletions

View File

@@ -25,7 +25,7 @@ With this mapping, we can add another indirection in vLLMs KV cache managemen
This design achieves automatic prefix caching without the need of maintaining a tree structure among the KV blocks. More specifically, all of the blocks are independent of each other and can be allocated and freed by itself, which enables us to manages the KV cache as ordinary caches in operating system.
# Generalized Caching Policy
## Generalized Caching Policy
Keeping all the KV blocks in a hash table enables vLLM to cache KV blocks from earlier requests to save memory and accelerate the computation of future requests. For example, if a new request shares the system prompt with the previous request, the KV cache of the shared prompt can directly be used for the new request without recomputation. However, the total KV cache space is limited and we have to decide which KV blocks to keep or evict when the cache is full.