[doc] Fold long code blocks to improve readability (#19926)

Signed-off-by: reidliu41 <reid201711@gmail.com> Co-authored-by: reidliu41 <reid201711@gmail.com>
2025-06-23 13:24:23 +08:00
parent 493c275352
commit f17aec0d63
50 changed files with 3455 additions and 3180 deletions
--- a/docs/deployment/k8s.md
+++ b/docs/deployment/k8s.md
@@ -29,85 +29,89 @@ Alternatively, you can deploy vLLM to Kubernetes using any of the following:

 First, create a Kubernetes PVC and Secret for downloading and storing Hugging Face model:

-```bash
-cat <<EOF |kubectl apply -f -
-apiVersion: v1
-kind: PersistentVolumeClaim
-metadata:
-  name: vllm-models
-spec:
-  accessModes:
-    - ReadWriteOnce
-  volumeMode: Filesystem
-  resources:
-    requests:
-      storage: 50Gi
---
-apiVersion: v1
-kind: Secret
-metadata:
-  name: hf-token-secret
-type: Opaque
-data:
-  token: $(HF_TOKEN)
-EOF
-```
+??? Config
+
+    ```bash
+    cat <<EOF |kubectl apply -f -
+    apiVersion: v1
+    kind: PersistentVolumeClaim
+    metadata:
+      name: vllm-models
+    spec:
+      accessModes:
+        - ReadWriteOnce
+      volumeMode: Filesystem
+      resources:
+        requests:
+          storage: 50Gi
+    ---
+    apiVersion: v1
+    kind: Secret
+    metadata:
+      name: hf-token-secret
+    type: Opaque
+    data:
+      token: $(HF_TOKEN)
+    EOF
+    ```

 Next, start the vLLM server as a Kubernetes Deployment and Service:

-```bash
-cat <<EOF |kubectl apply -f -
-apiVersion: apps/v1
-kind: Deployment
-metadata:
-  name: vllm-server
-spec:
-  replicas: 1
-  selector:
-    matchLabels:
-      app.kubernetes.io/name: vllm
-  template:
+??? Config
+
+    ```bash
+    cat <<EOF |kubectl apply -f -
+    apiVersion: apps/v1
+    kind: Deployment
    metadata:
-      labels:
-        app.kubernetes.io/name: vllm
+      name: vllm-server
    spec:
-      containers:
-      - name: vllm
-        image: vllm/vllm-openai:latest
-        command: ["/bin/sh", "-c"]
-        args: [
-          "vllm serve meta-llama/Llama-3.2-1B-Instruct"
-        ]
-        env:
-        - name: HUGGING_FACE_HUB_TOKEN
-          valueFrom:
-            secretKeyRef:
-              name: hf-token-secret
-              key: token
-        ports:
-          - containerPort: 8000
-        volumeMounts:
+      replicas: 1
+      selector:
+        matchLabels:
+          app.kubernetes.io/name: vllm
+      template:
+        metadata:
+          labels:
+            app.kubernetes.io/name: vllm
+        spec:
+          containers:
+          - name: vllm
+            image: vllm/vllm-openai:latest
+            command: ["/bin/sh", "-c"]
+            args: [
+              "vllm serve meta-llama/Llama-3.2-1B-Instruct"
+            ]
+            env:
+            - name: HUGGING_FACE_HUB_TOKEN
+              valueFrom:
+                secretKeyRef:
+                  name: hf-token-secret
+                  key: token
+            ports:
+              - containerPort: 8000
+            volumeMounts:
+              - name: llama-storage
+                mountPath: /root/.cache/huggingface
+          volumes:
          - name: llama-storage
-            mountPath: /root/.cache/huggingface
-      volumes:
-      - name: llama-storage
-        persistentVolumeClaim:
-          claimName: vllm-models
---
-apiVersion: v1
-kind: Service
-metadata:
-  name: vllm-server
-spec:
-  selector:
-    app.kubernetes.io/name: vllm
-  ports:
-  - protocol: TCP
-    port: 8000
-    targetPort: 8000
-  type: ClusterIP
-EOF
-```
+            persistentVolumeClaim:
+              claimName: vllm-models
+    ---
+    apiVersion: v1
+    kind: Service
+    metadata:
+      name: vllm-server
+    spec:
+      selector:
+        app.kubernetes.io/name: vllm
+      ports:
+      - protocol: TCP
+        port: 8000
+        targetPort: 8000
+      type: ClusterIP
+    EOF
+    ```

 We can verify that the vLLM server has started successfully via the logs (this might take a couple of minutes to download the model):

@@ -128,6 +132,9 @@ INFO:     Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)

      PVC is used to store the model cache and it is optional, you can use hostPath or other storage options

+      <details>
+      <summary>Yaml</summary>
+
      ```yaml
      apiVersion: v1
      kind: PersistentVolumeClaim
@@ -144,6 +151,8 @@ INFO:     Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
        volumeMode: Filesystem
      ```

+      </details>
+
      Secret is optional and only required for accessing gated models, you can skip this step if you are not using gated models

      ```yaml
@@ -156,13 +165,16 @@ INFO:     Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
      stringData:
        token: "REPLACE_WITH_TOKEN"
      ```
-
+  
      Next to create the deployment file for vLLM to run the model server. The following example deploys the `Mistral-7B-Instruct-v0.3` model.

      Here are two examples for using NVIDIA GPU and AMD GPU.

      NVIDIA GPU:

+      <details>
+      <summary>Yaml</summary>
+
      ```yaml
      apiVersion: apps/v1
      kind: Deployment
@@ -233,10 +245,15 @@ INFO:     Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
                periodSeconds: 5
      ```

+      </details>
+
      AMD GPU:

      You can refer to the `deployment.yaml` below if using AMD ROCm GPU like MI300X.

+      <details>
+      <summary>Yaml</summary>
+
      ```yaml
      apiVersion: apps/v1
      kind: Deployment
@@ -305,12 +322,17 @@ INFO:     Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
                mountPath: /dev/shm
      ```

+      </details>
+
      You can get the full example with steps and sample yaml files from <https://github.com/ROCm/k8s-device-plugin/tree/master/example/vllm-serve>.

 2. Create a Kubernetes Service for vLLM

      Next, create a Kubernetes Service file to expose the `mistral-7b` deployment:

+      <details>
+      <summary>Yaml</summary>
+
      ```yaml
      apiVersion: v1
      kind: Service
@@ -330,6 +352,8 @@ INFO:     Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
        type: ClusterIP
      ```

+      </details>
+
 3. Deploy and Test

      Apply the deployment and service configurations using `kubectl apply -f <filename>`: