[Docs] Fix syntax highlighting of shell commands (#19870)
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>
This commit is contained in:
@@ -15,7 +15,7 @@ vLLM can be **run and scaled to multiple service replicas on clouds and Kubernet
|
||||
- Check that you have installed SkyPilot ([docs](https://skypilot.readthedocs.io/en/latest/getting-started/installation.html)).
|
||||
- Check that `sky check` shows clouds or Kubernetes are enabled.
|
||||
|
||||
```console
|
||||
```bash
|
||||
pip install skypilot-nightly
|
||||
sky check
|
||||
```
|
||||
@@ -71,7 +71,7 @@ See the vLLM SkyPilot YAML for serving, [serving.yaml](https://github.com/skypil
|
||||
|
||||
Start the serving the Llama-3 8B model on any of the candidate GPUs listed (L4, A10g, ...):
|
||||
|
||||
```console
|
||||
```bash
|
||||
HF_TOKEN="your-huggingface-token" sky launch serving.yaml --env HF_TOKEN
|
||||
```
|
||||
|
||||
@@ -83,7 +83,7 @@ Check the output of the command. There will be a shareable gradio link (like the
|
||||
|
||||
**Optional**: Serve the 70B model instead of the default 8B and use more GPU:
|
||||
|
||||
```console
|
||||
```bash
|
||||
HF_TOKEN="your-huggingface-token" \
|
||||
sky launch serving.yaml \
|
||||
--gpus A100:8 \
|
||||
@@ -159,7 +159,7 @@ SkyPilot can scale up the service to multiple service replicas with built-in aut
|
||||
|
||||
Start the serving the Llama-3 8B model on multiple replicas:
|
||||
|
||||
```console
|
||||
```bash
|
||||
HF_TOKEN="your-huggingface-token" \
|
||||
sky serve up -n vllm serving.yaml \
|
||||
--env HF_TOKEN
|
||||
@@ -167,7 +167,7 @@ HF_TOKEN="your-huggingface-token" \
|
||||
|
||||
Wait until the service is ready:
|
||||
|
||||
```console
|
||||
```bash
|
||||
watch -n10 sky serve status vllm
|
||||
```
|
||||
|
||||
@@ -271,13 +271,13 @@ This will scale the service up to when the QPS exceeds 2 for each replica.
|
||||
|
||||
To update the service with the new config:
|
||||
|
||||
```console
|
||||
```bash
|
||||
HF_TOKEN="your-huggingface-token" sky serve update vllm serving.yaml --env HF_TOKEN
|
||||
```
|
||||
|
||||
To stop the service:
|
||||
|
||||
```console
|
||||
```bash
|
||||
sky serve down vllm
|
||||
```
|
||||
|
||||
@@ -317,7 +317,7 @@ It is also possible to access the Llama-3 service with a separate GUI frontend,
|
||||
|
||||
1. Start the chat web UI:
|
||||
|
||||
```console
|
||||
```bash
|
||||
sky launch \
|
||||
-c gui ./gui.yaml \
|
||||
--env ENDPOINT=$(sky serve status --endpoint vllm)
|
||||
|
||||
Reference in New Issue
Block a user