Make distinct code and console admonitions so readers are less likely to miss them (#20585)

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
This commit is contained in:
Harry Mellor
2025-07-08 03:55:28 +01:00
committed by GitHub
parent 31c5d0a1b7
commit af107d5a0e
52 changed files with 192 additions and 162 deletions

View File

@@ -30,7 +30,7 @@ python -m vllm.entrypoints.openai.api_server \
- Call it with AutoGen:
??? Code
??? code
```python
import asyncio

View File

@@ -34,7 +34,7 @@ vllm = "latest"
Next, let us add our code to handle inference for the LLM of your choice (`mistralai/Mistral-7B-Instruct-v0.1` for this example), add the following code to your `main.py`:
??? Code
??? code
```python
from vllm import LLM, SamplingParams
@@ -64,7 +64,7 @@ cerebrium deploy
If successful, you should be returned a CURL command that you can call inference against. Just remember to end the url with the function name you are calling (in our case`/run`)
??? Command
??? console "Command"
```python
curl -X POST https://api.cortex.cerebrium.ai/v4/p-xxxxxx/vllm/run \
@@ -82,7 +82,7 @@ If successful, you should be returned a CURL command that you can call inference
You should get a response like:
??? Response
??? console "Response"
```python
{

View File

@@ -26,7 +26,7 @@ dstack init
Next, to provision a VM instance with LLM of your choice (`NousResearch/Llama-2-7b-chat-hf` for this example), create the following `serve.dstack.yml` file for the dstack `Service`:
??? Config
??? code "Config"
```yaml
type: service
@@ -48,7 +48,7 @@ Next, to provision a VM instance with LLM of your choice (`NousResearch/Llama-2-
Then, run the following CLI for provisioning:
??? Command
??? console "Command"
```console
$ dstack run . -f serve.dstack.yml
@@ -79,7 +79,7 @@ Then, run the following CLI for provisioning:
After the provisioning, you can interact with the model by using the OpenAI SDK:
??? Code
??? code
```python
from openai import OpenAI

View File

@@ -27,7 +27,7 @@ vllm serve mistralai/Mistral-7B-Instruct-v0.1
- Use the `OpenAIGenerator` and `OpenAIChatGenerator` components in Haystack to query the vLLM server.
??? Code
??? code
```python
from haystack.components.generators.chat import OpenAIChatGenerator

View File

@@ -34,7 +34,7 @@ vllm serve qwen/Qwen1.5-0.5B-Chat
- Call it with litellm:
??? Code
??? code
```python
import litellm

View File

@@ -17,7 +17,7 @@ vLLM can be deployed with [LWS](https://github.com/kubernetes-sigs/lws) on Kuber
Deploy the following yaml file `lws.yaml`
??? Yaml
??? code "Yaml"
```yaml
apiVersion: leaderworkerset.x-k8s.io/v1
@@ -177,7 +177,7 @@ curl http://localhost:8080/v1/completions \
The output should be similar to the following
??? Output
??? console "Output"
```text
{

View File

@@ -24,7 +24,7 @@ sky check
See the vLLM SkyPilot YAML for serving, [serving.yaml](https://github.com/skypilot-org/skypilot/blob/master/llm/vllm/serve.yaml).
??? Yaml
??? code "Yaml"
```yaml
resources:
@@ -95,7 +95,7 @@ HF_TOKEN="your-huggingface-token" \
SkyPilot can scale up the service to multiple service replicas with built-in autoscaling, load-balancing and fault-tolerance. You can do it by adding a services section to the YAML file.
??? Yaml
??? code "Yaml"
```yaml
service:
@@ -111,7 +111,7 @@ SkyPilot can scale up the service to multiple service replicas with built-in aut
max_completion_tokens: 1
```
??? Yaml
??? code "Yaml"
```yaml
service:
@@ -186,7 +186,7 @@ vllm 2 1 xx.yy.zz.245 18 mins ago 1x GCP([Spot]{'L4': 1}) R
After the service is READY, you can find a single endpoint for the service and access the service with the endpoint:
??? Commands
??? console "Commands"
```bash
ENDPOINT=$(sky serve status --endpoint 8081 vllm)
@@ -220,7 +220,7 @@ service:
This will scale the service up to when the QPS exceeds 2 for each replica.
??? Yaml
??? code "Yaml"
```yaml
service:
@@ -285,7 +285,7 @@ sky serve down vllm
It is also possible to access the Llama-3 service with a separate GUI frontend, so the user requests send to the GUI will be load-balanced across replicas.
??? Yaml
??? code "Yaml"
```yaml
envs: