Write README and front page of doc (#147)

This commit is contained in:
Woosuk Kwon
2023-06-18 03:19:38 -07:00
committed by GitHub
parent bf5f121c02
commit dcda03b4cb
9 changed files with 65 additions and 60 deletions

View File

@@ -1,7 +1,21 @@
Welcome to vLLM!
================
vLLM is a high-throughput and memory-efficient inference and serving engine for large language models (LLM).
**vLLM** is a fast and easy-to-use library for LLM inference and serving.
Its core features include:
- State-of-the-art performance in serving throughput
- Efficient management of attention key and value memory with **PagedAttention**
- Seamless integration with popular HuggingFace models
- Dynamic batching of incoming requests
- Optimized CUDA kernels
- High-throughput serving with various decoding algorithms, including *parallel sampling* and *beam search*
- Tensor parallelism support for distributed inference
- Streaming outputs
- OpenAI-compatible API server
For more information, please refer to our `blog post <>`_.
Documentation
-------------