[doc] format fix (#10789)
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
This commit is contained in:
@@ -25,7 +25,7 @@ With this mapping, we can add another indirection in vLLM’s KV cache managemen
|
||||
This design achieves automatic prefix caching without the need of maintaining a tree structure among the KV blocks. More specifically, all of the blocks are independent of each other and can be allocated and freed by itself, which enables us to manages the KV cache as ordinary caches in operating system.
|
||||
|
||||
|
||||
# Generalized Caching Policy
|
||||
## Generalized Caching Policy
|
||||
|
||||
Keeping all the KV blocks in a hash table enables vLLM to cache KV blocks from earlier requests to save memory and accelerate the computation of future requests. For example, if a new request shares the system prompt with the previous request, the KV cache of the shared prompt can directly be used for the new request without recomputation. However, the total KV cache space is limited and we have to decide which KV blocks to keep or evict when the cache is full.
|
||||
|
||||
|
||||
Reference in New Issue
Block a user