Write README and front page of doc (#147)

2023-06-18 03:19:38 -07:00
parent bf5f121c02
commit dcda03b4cb
9 changed files with 65 additions and 60 deletions
--- a/docs/source/index.rst
+++ b/docs/source/index.rst
@@ -1,7 +1,21 @@
 Welcome to vLLM!
 ================

-vLLM is a high-throughput and memory-efficient inference and serving engine for large language models (LLM).
+**vLLM** is a fast and easy-to-use library for LLM inference and serving.
+Its core features include:
+
+- State-of-the-art performance in serving throughput
+- Efficient management of attention key and value memory with **PagedAttention**
+- Seamless integration with popular HuggingFace models
+- Dynamic batching of incoming requests
+- Optimized CUDA kernels
+- High-throughput serving with various decoding algorithms, including *parallel sampling* and *beam search*
+- Tensor parallelism support for distributed inference
+- Streaming outputs
+- OpenAI-compatible API server
+
+For more information, please refer to our `blog post <>`_.
+

 Documentation
 -------------