[Doc][5/N] Move Community and API Reference to the bottom (#11896)

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: Simon Mo <simon.mo@hey.com>
This commit is contained in:
Cyrus Leung
2025-01-10 11:10:12 +08:00
committed by GitHub
parent 36f5303578
commit c3cf54dda4
3 changed files with 40 additions and 26 deletions

View File

@@ -41,7 +41,7 @@ vLLM is a fast and easy-to-use library for LLM inference and serving.
vLLM is fast with:
- State-of-the-art serving throughput
- Efficient management of attention key and value memory with **PagedAttention**
- Efficient management of attention key and value memory with [**PagedAttention**](https://blog.vllm.ai/2023/06/20/vllm.html)
- Continuous batching of incoming requests
- Fast model execution with CUDA/HIP graph
- Quantizations: [GPTQ](https://arxiv.org/abs/2210.17323), [AWQ](https://arxiv.org/abs/2306.00978), INT4, INT8, and FP8.