[CI/Build] Add markdown linter (#11857)
Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com>
This commit is contained in:
@@ -13,14 +13,14 @@ vLLM can be run on a cloud based GPU machine with [Cerebrium](https://www.cerebr
|
||||
To install the Cerebrium client, run:
|
||||
|
||||
```console
|
||||
$ pip install cerebrium
|
||||
$ cerebrium login
|
||||
pip install cerebrium
|
||||
cerebrium login
|
||||
```
|
||||
|
||||
Next, create your Cerebrium project, run:
|
||||
|
||||
```console
|
||||
$ cerebrium init vllm-project
|
||||
cerebrium init vllm-project
|
||||
```
|
||||
|
||||
Next, to install the required packages, add the following to your cerebrium.toml:
|
||||
@@ -58,10 +58,10 @@ def run(prompts: list[str], temperature: float = 0.8, top_p: float = 0.95):
|
||||
Then, run the following code to deploy it to the cloud:
|
||||
|
||||
```console
|
||||
$ cerebrium deploy
|
||||
cerebrium deploy
|
||||
```
|
||||
|
||||
If successful, you should be returned a CURL command that you can call inference against. Just remember to end the url with the function name you are calling (in our case` /run`)
|
||||
If successful, you should be returned a CURL command that you can call inference against. Just remember to end the url with the function name you are calling (in our case`/run`)
|
||||
|
||||
```python
|
||||
curl -X POST https://api.cortex.cerebrium.ai/v4/p-xxxxxx/vllm/run \
|
||||
|
||||
@@ -13,16 +13,16 @@ vLLM can be run on a cloud based GPU machine with [dstack](https://dstack.ai/),
|
||||
To install dstack client, run:
|
||||
|
||||
```console
|
||||
$ pip install "dstack[all]
|
||||
$ dstack server
|
||||
pip install "dstack[all]
|
||||
dstack server
|
||||
```
|
||||
|
||||
Next, to configure your dstack project, run:
|
||||
|
||||
```console
|
||||
$ mkdir -p vllm-dstack
|
||||
$ cd vllm-dstack
|
||||
$ dstack init
|
||||
mkdir -p vllm-dstack
|
||||
cd vllm-dstack
|
||||
dstack init
|
||||
```
|
||||
|
||||
Next, to provision a VM instance with LLM of your choice (`NousResearch/Llama-2-7b-chat-hf` for this example), create the following `serve.dstack.yml` file for the dstack `Service`:
|
||||
|
||||
@@ -334,12 +334,12 @@ run: |
|
||||
|
||||
1. Start the chat web UI:
|
||||
|
||||
```console
|
||||
sky launch -c gui ./gui.yaml --env ENDPOINT=$(sky serve status --endpoint vllm)
|
||||
```
|
||||
```console
|
||||
sky launch -c gui ./gui.yaml --env ENDPOINT=$(sky serve status --endpoint vllm)
|
||||
```
|
||||
|
||||
2. Then, we can access the GUI at the returned gradio link:
|
||||
|
||||
```console
|
||||
| INFO | stdout | Running on public URL: https://6141e84201ce0bb4ed.gradio.live
|
||||
```
|
||||
```console
|
||||
| INFO | stdout | Running on public URL: https://6141e84201ce0bb4ed.gradio.live
|
||||
```
|
||||
|
||||
Reference in New Issue
Block a user