[Docs] Fix syntax highlighting of shell commands (#19870)

Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>
2025-06-23 18:59:09 +01:00
parent 53243e5c42
commit c3649e4fee
53 changed files with 220 additions and 220 deletions
--- a/docs/deployment/frameworks/anything-llm.md
+++ b/docs/deployment/frameworks/anything-llm.md
@@ -15,7 +15,7 @@ It allows you to deploy a large language model (LLM) server with vLLM as the bac

 - Start the vLLM server with the supported chat completion model, e.g.

-```console
+```bash
 vllm serve Qwen/Qwen1.5-32B-Chat-AWQ --max-model-len 4096
 ```

--- a/docs/deployment/frameworks/autogen.md
+++ b/docs/deployment/frameworks/autogen.md
@@ -11,7 +11,7 @@ title: AutoGen

 - Setup [AutoGen](https://microsoft.github.io/autogen/0.2/docs/installation/) environment

-```console
+```bash
 pip install vllm

 # Install AgentChat and OpenAI client from Extensions
@@ -23,7 +23,7 @@ pip install -U "autogen-agentchat" "autogen-ext[openai]"

 - Start the vLLM server with the supported chat completion model, e.g.

-```console
+```bash
 python -m vllm.entrypoints.openai.api_server \
    --model mistralai/Mistral-7B-Instruct-v0.2
 ```
--- a/docs/deployment/frameworks/cerebrium.md
+++ b/docs/deployment/frameworks/cerebrium.md
@@ -11,14 +11,14 @@ vLLM can be run on a cloud based GPU machine with [Cerebrium](https://www.cerebr

 To install the Cerebrium client, run:

-```console
+```bash
 pip install cerebrium
 cerebrium login
 ```

 Next, create your Cerebrium project, run:

-```console
+```bash
 cerebrium init vllm-project
 ```

@@ -58,7 +58,7 @@ Next, let us add our code to handle inference for the LLM of your choice (`mistr

 Then, run the following code to deploy it to the cloud:

-```console
+```bash
 cerebrium deploy
 ```

--- a/docs/deployment/frameworks/chatbox.md
+++ b/docs/deployment/frameworks/chatbox.md
@@ -15,7 +15,7 @@ It allows you to deploy a large language model (LLM) server with vLLM as the bac

 - Start the vLLM server with the supported chat completion model, e.g.

-```console
+```bash
 vllm serve qwen/Qwen1.5-0.5B-Chat
 ```

--- a/docs/deployment/frameworks/dify.md
+++ b/docs/deployment/frameworks/dify.md
@@ -18,13 +18,13 @@ This guide walks you through deploying Dify using a vLLM backend.

 - Start the vLLM server with the supported chat completion model, e.g.

-```console
+```bash
 vllm serve Qwen/Qwen1.5-7B-Chat
 ```

 - Start the Dify server with docker compose ([details](https://github.com/langgenius/dify?tab=readme-ov-file#quick-start)):

-```console
+```bash
 git clone https://github.com/langgenius/dify.git
 cd dify
 cd docker
--- a/docs/deployment/frameworks/dstack.md
+++ b/docs/deployment/frameworks/dstack.md
@@ -11,14 +11,14 @@ vLLM can be run on a cloud based GPU machine with [dstack](https://dstack.ai/),

 To install dstack client, run:

-```console
+```bash
 pip install "dstack[all]
 dstack server
 ```

 Next, to configure your dstack project, run:

-```console
+```bash
 mkdir -p vllm-dstack
 cd vllm-dstack
 dstack init
--- a/docs/deployment/frameworks/haystack.md
+++ b/docs/deployment/frameworks/haystack.md
@@ -13,7 +13,7 @@ It allows you to deploy a large language model (LLM) server with vLLM as the bac

 - Setup vLLM and Haystack environment

-```console
+```bash
 pip install vllm haystack-ai
 ```

@@ -21,7 +21,7 @@ pip install vllm haystack-ai

 - Start the vLLM server with the supported chat completion model, e.g.

-```console
+```bash
 vllm serve mistralai/Mistral-7B-Instruct-v0.1
 ```

--- a/docs/deployment/frameworks/helm.md
+++ b/docs/deployment/frameworks/helm.md
@@ -22,7 +22,7 @@ Before you begin, ensure that you have the following:

 To install the chart with the release name `test-vllm`:

-```console
+```bash
 helm upgrade --install --create-namespace --namespace=ns-vllm test-vllm . -f values.yaml --set secrets.s3endpoint=$ACCESS_POINT --set secrets.s3bucketname=$BUCKET --set secrets.s3accesskeyid=$ACCESS_KEY --set secrets.s3accesskey=$SECRET_KEY
 ```

@@ -30,7 +30,7 @@ helm upgrade --install --create-namespace --namespace=ns-vllm test-vllm . -f val

 To uninstall the `test-vllm` deployment:

-```console
+```bash
 helm uninstall test-vllm --namespace=ns-vllm
 ```

--- a/docs/deployment/frameworks/litellm.md
+++ b/docs/deployment/frameworks/litellm.md
@@ -18,7 +18,7 @@ And LiteLLM supports all models on VLLM.

 - Setup vLLM and litellm environment

-```console
+```bash
 pip install vllm litellm
 ```

@@ -28,7 +28,7 @@ pip install vllm litellm

 - Start the vLLM server with the supported chat completion model, e.g.

-```console
+```bash
 vllm serve qwen/Qwen1.5-0.5B-Chat
 ```

@@ -56,7 +56,7 @@ vllm serve qwen/Qwen1.5-0.5B-Chat

 - Start the vLLM server with the supported embedding model, e.g.

-```console
+```bash
 vllm serve BAAI/bge-base-en-v1.5
 ```

--- a/docs/deployment/frameworks/open-webui.md
+++ b/docs/deployment/frameworks/open-webui.md
@@ -7,13 +7,13 @@ title: Open WebUI

 2. Start the vLLM server with the supported chat completion model, e.g.

-```console
+```bash
 vllm serve qwen/Qwen1.5-0.5B-Chat
 ```

 1. Start the [Open WebUI](https://github.com/open-webui/open-webui) docker container (replace the vllm serve host and vllm serve port):

-```console
+```bash
 docker run -d -p 3000:8080 \
 --name open-webui \
 -v open-webui:/app/backend/data \
--- a/docs/deployment/frameworks/retrieval_augmented_generation.md
+++ b/docs/deployment/frameworks/retrieval_augmented_generation.md
@@ -15,7 +15,7 @@ Here are the integrations:

 - Setup vLLM and langchain environment

-```console
+```bash
 pip install -U vllm \
            langchain_milvus langchain_openai \
            langchain_community beautifulsoup4 \
@@ -26,14 +26,14 @@ pip install -U vllm \

 - Start the vLLM server with the supported embedding model, e.g.

-```console
+```bash
 # Start embedding service (port 8000)
 vllm serve ssmits/Qwen2-7B-Instruct-embed-base
 ```

 - Start the vLLM server with the supported chat completion model, e.g.

-```console
+```bash
 # Start chat service (port 8001)
 vllm serve qwen/Qwen1.5-0.5B-Chat --port 8001
 ```
@@ -52,7 +52,7 @@ python retrieval_augmented_generation_with_langchain.py

 - Setup vLLM and llamaindex environment

-```console
+```bash
 pip install vllm \
            llama-index llama-index-readers-web \
            llama-index-llms-openai-like    \
@@ -64,14 +64,14 @@ pip install vllm \

 - Start the vLLM server with the supported embedding model, e.g.

-```console
+```bash
 # Start embedding service (port 8000)
 vllm serve ssmits/Qwen2-7B-Instruct-embed-base
 ```

 - Start the vLLM server with the supported chat completion model, e.g.

-```console
+```bash
 # Start chat service (port 8001)
 vllm serve qwen/Qwen1.5-0.5B-Chat --port 8001
 ```
--- a/docs/deployment/frameworks/skypilot.md
+++ b/docs/deployment/frameworks/skypilot.md
@@ -15,7 +15,7 @@ vLLM can be **run and scaled to multiple service replicas on clouds and Kubernet
 - Check that you have installed SkyPilot ([docs](https://skypilot.readthedocs.io/en/latest/getting-started/installation.html)).
 - Check that `sky check` shows clouds or Kubernetes are enabled.

-```console
+```bash
 pip install skypilot-nightly
 sky check
 ```
@@ -71,7 +71,7 @@ See the vLLM SkyPilot YAML for serving, [serving.yaml](https://github.com/skypil

 Start the serving the Llama-3 8B model on any of the candidate GPUs listed (L4, A10g, ...):

-```console
+```bash
 HF_TOKEN="your-huggingface-token" sky launch serving.yaml --env HF_TOKEN
 ```

@@ -83,7 +83,7 @@ Check the output of the command. There will be a shareable gradio link (like the

 **Optional**: Serve the 70B model instead of the default 8B and use more GPU:

-```console
+```bash
 HF_TOKEN="your-huggingface-token" \
  sky launch serving.yaml \
  --gpus A100:8 \
@@ -159,7 +159,7 @@ SkyPilot can scale up the service to multiple service replicas with built-in aut

 Start the serving the Llama-3 8B model on multiple replicas:

-```console
+```bash
 HF_TOKEN="your-huggingface-token" \
  sky serve up -n vllm serving.yaml \
  --env HF_TOKEN
@@ -167,7 +167,7 @@ HF_TOKEN="your-huggingface-token" \

 Wait until the service is ready:

-```console
+```bash
 watch -n10 sky serve status vllm
 ```

@@ -271,13 +271,13 @@ This will scale the service up to when the QPS exceeds 2 for each replica.

 To update the service with the new config:

-```console
+```bash
 HF_TOKEN="your-huggingface-token" sky serve update vllm serving.yaml --env HF_TOKEN
 ```

 To stop the service:

-```console
+```bash
 sky serve down vllm
 ```

@@ -317,7 +317,7 @@ It is also possible to access the Llama-3 service with a separate GUI frontend,

 1. Start the chat web UI:

-    ```console
+    ```bash
    sky launch \
      -c gui ./gui.yaml \
      --env ENDPOINT=$(sky serve status --endpoint vllm)
--- a/docs/deployment/frameworks/streamlit.md
+++ b/docs/deployment/frameworks/streamlit.md
@@ -15,13 +15,13 @@ It can be quickly integrated with vLLM as a backend API server, enabling powerfu

 - Start the vLLM server with the supported chat completion model, e.g.

-```console
+```bash
 vllm serve qwen/Qwen1.5-0.5B-Chat
 ```

 - Install streamlit and openai:

-```console
+```bash
 pip install streamlit openai
 ```

@@ -29,7 +29,7 @@ pip install streamlit openai

 - Start the streamlit web UI and start to chat:

-```console
+```bash
 streamlit run streamlit_openai_chatbot_webserver.py

 # or specify the VLLM_API_BASE or VLLM_API_KEY