[Doc] Documentation for distributed inference (#261)

This commit is contained in:
Zhuohan Li
2023-06-26 11:34:23 -07:00
committed by GitHub
parent 0b7db411b5
commit 2cf1a333b6
4 changed files with 54 additions and 3 deletions

View File

@@ -28,7 +28,7 @@ vLLM is fast with:
- State-of-the-art serving throughput
- Efficient management of attention key and value memory with **PagedAttention**
- Dynamic batching of incoming requests
- Continuous batching of incoming requests
- Optimized CUDA kernels
vLLM is flexible and easy to use with: