Change the name to vLLM (#150)

This commit is contained in:
Woosuk Kwon
2023-06-17 03:07:40 -07:00
committed by GitHub
parent e5464ee484
commit 0b98ba15c7
90 changed files with 342 additions and 339 deletions

View File

@@ -1,4 +1,4 @@
# CacheFlow
# vLLM
## Build from source
@@ -28,7 +28,7 @@ python examples/simple_server.py --help
To start the server:
```bash
ray start --head
python -m cacheflow.entrypoints.fastapi_server # --model <your_model>
python -m vllm.entrypoints.fastapi_server # --model <your_model>
```
To test the server:
@@ -45,9 +45,9 @@ pip install gradio
Start the server:
```bash
python -m cacheflow.http_frontend.fastapi_frontend
python -m vllm.http_frontend.fastapi_frontend
# At another terminal
python -m cacheflow.http_frontend.gradio_webserver
python -m vllm.http_frontend.gradio_webserver
```
## Load LLaMA weights
@@ -62,5 +62,5 @@ Since LLaMA weight is not fully public, we cannot directly download the LLaMA we
2. For all the commands above, specify the model with `--model /output/path/llama-7b` to load the model. For example:
```bash
python simple_server.py --model /output/path/llama-7b
python -m cacheflow.http_frontend.fastapi_frontend --model /output/path/llama-7b
python -m vllm.http_frontend.fastapi_frontend --model /output/path/llama-7b
```