[Doc] Documentation for distributed inference (#261)

2023-06-26 11:34:23 -07:00
parent 0b7db411b5
commit 2cf1a333b6
4 changed files with 54 additions and 3 deletions
--- a/README.md
+++ b/README.md
@@ -28,7 +28,7 @@ vLLM is fast with:

 - State-of-the-art serving throughput
 - Efficient management of attention key and value memory with **PagedAttention**
- Dynamic batching of incoming requests
+- Continuous batching of incoming requests
 - Optimized CUDA kernels

 vLLM is flexible and easy to use with: