[Doc][5/N] Move Community and API Reference to the bottom (#11896)

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Co-authored-by: Simon Mo <simon.mo@hey.com>
2025-01-10 11:10:12 +08:00
parent 36f5303578
commit c3cf54dda4
3 changed files with 40 additions and 26 deletions
--- a/docs/source/design/automatic_prefix_caching.md
+++ b/docs/source/design/automatic_prefix_caching.md
@@ -2,7 +2,7 @@

 # Automatic Prefix Caching

-The core idea of [PagedAttention](#design-paged-attention) is to partition the KV cache of each request into KV Blocks. Each block contains the attention keys and values for a fixed number of tokens. The PagedAttention algorithm allows these blocks to be stored in non-contiguous physical memory so that we can eliminate memory fragmentation by allocating the memory on demand.
+The core idea of [PagedAttention](https://blog.vllm.ai/2023/06/20/vllm.html) is to partition the KV cache of each request into KV Blocks. Each block contains the attention keys and values for a fixed number of tokens. The PagedAttention algorithm allows these blocks to be stored in non-contiguous physical memory so that we can eliminate memory fragmentation by allocating the memory on demand.

 To automatically cache the KV cache, we utilize the following key observation: Each KV block can be uniquely identified by the tokens within the block and the tokens in the prefix before the block.