[doc] improve readability (#18675)

Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
This commit is contained in:
Reid
2025-05-25 16:40:31 +08:00
committed by GitHub
parent 624b77a2b3
commit 279f854519
20 changed files with 206 additions and 59 deletions

View File

@@ -82,7 +82,11 @@ Check the output of the command. There will be a shareable gradio link (like the
**Optional**: Serve the 70B model instead of the default 8B and use more GPU:
```console
HF_TOKEN="your-huggingface-token" sky launch serving.yaml --gpus A100:8 --env HF_TOKEN --env MODEL_NAME=meta-llama/Meta-Llama-3-70B-Instruct
HF_TOKEN="your-huggingface-token" \
sky launch serving.yaml \
--gpus A100:8 \
--env HF_TOKEN \
--env MODEL_NAME=meta-llama/Meta-Llama-3-70B-Instruct
```
## Scale up to multiple replicas
@@ -155,7 +159,9 @@ run: |
Start the serving the Llama-3 8B model on multiple replicas:
```console
HF_TOKEN="your-huggingface-token" sky serve up -n vllm serving.yaml --env HF_TOKEN
HF_TOKEN="your-huggingface-token" \
sky serve up -n vllm serving.yaml \
--env HF_TOKEN
```
Wait until the service is ready:
@@ -318,7 +324,9 @@ run: |
1. Start the chat web UI:
```console
sky launch -c gui ./gui.yaml --env ENDPOINT=$(sky serve status --endpoint vllm)
sky launch \
-c gui ./gui.yaml \
--env ENDPOINT=$(sky serve status --endpoint vllm)
```
2. Then, we can access the GUI at the returned gradio link: