[CI/Build] Add markdown linter (#11857)

Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com>
2025-01-12 03:17:13 -05:00
parent b25cfab9a0
commit 43f3d9e699
49 changed files with 585 additions and 560 deletions
--- a/docs/source/getting_started/installation/gpu-rocm.md
+++ b/docs/source/getting_started/installation/gpu-rocm.md
@@ -47,13 +47,13 @@ Their values can be passed in when running `docker build` with `--build-arg` opt
 To build vllm on ROCm 6.2 for MI200 and MI300 series, you can use the default:

 ```console
-$ DOCKER_BUILDKIT=1 docker build -f Dockerfile.rocm -t vllm-rocm .
+DOCKER_BUILDKIT=1 docker build -f Dockerfile.rocm -t vllm-rocm .
 ```

 To build vllm on ROCm 6.2 for Radeon RX7900 series (gfx1100), you should specify `BUILD_FA` as below:

 ```console
-$ DOCKER_BUILDKIT=1 docker build --build-arg BUILD_FA="0" -f Dockerfile.rocm -t vllm-rocm .
+DOCKER_BUILDKIT=1 docker build --build-arg BUILD_FA="0" -f Dockerfile.rocm -t vllm-rocm .
 ```

 To run the above docker image `vllm-rocm`, use the below command:
@@ -83,81 +83,81 @@ Where the `<path/to/model>` is the location where the model is stored, for examp
 - [ROCm](https://rocm.docs.amd.com/en/latest/deploy/linux/index.html)
 - [PyTorch](https://pytorch.org/)

-For installing PyTorch, you can start from a fresh docker image, e.g, `rocm/pytorch:rocm6.2_ubuntu20.04_py3.9_pytorch_release_2.3.0`, `rocm/pytorch-nightly`.
+    For installing PyTorch, you can start from a fresh docker image, e.g, `rocm/pytorch:rocm6.2_ubuntu20.04_py3.9_pytorch_release_2.3.0`, `rocm/pytorch-nightly`.

-Alternatively, you can install PyTorch using PyTorch wheels. You can check PyTorch installation guide in PyTorch [Getting Started](https://pytorch.org/get-started/locally/)
+    Alternatively, you can install PyTorch using PyTorch wheels. You can check PyTorch installation guide in PyTorch [Getting Started](https://pytorch.org/get-started/locally/)

 1. Install [Triton flash attention for ROCm](https://github.com/ROCm/triton)

-Install ROCm's Triton flash attention (the default triton-mlir branch) following the instructions from [ROCm/triton](https://github.com/ROCm/triton/blob/triton-mlir/README.md)
+    Install ROCm's Triton flash attention (the default triton-mlir branch) following the instructions from [ROCm/triton](https://github.com/ROCm/triton/blob/triton-mlir/README.md)

-```console
-$ python3 -m pip install ninja cmake wheel pybind11
-$ pip uninstall -y triton
-$ git clone https://github.com/OpenAI/triton.git
-$ cd triton
-$ git checkout e192dba
-$ cd python
-$ pip3 install .
-$ cd ../..
-```
+    ```console
+    python3 -m pip install ninja cmake wheel pybind11
+    pip uninstall -y triton
+    git clone https://github.com/OpenAI/triton.git
+    cd triton
+    git checkout e192dba
+    cd python
+    pip3 install .
+    cd ../..
+    ```

-```{note}
- If you see HTTP issue related to downloading packages during building triton, please try again as the HTTP error is intermittent.
-```
+    ```{note}
+    - If you see HTTP issue related to downloading packages during building triton, please try again as the HTTP error is intermittent.
+    ```

 2. Optionally, if you choose to use CK flash attention, you can install [flash attention for ROCm](https://github.com/ROCm/flash-attention/tree/ck_tile)

-Install ROCm's flash attention (v2.5.9.post1) following the instructions from [ROCm/flash-attention](https://github.com/ROCm/flash-attention/tree/ck_tile#amd-gpurocm-support)
-Alternatively, wheels intended for vLLM use can be accessed under the releases.
+    Install ROCm's flash attention (v2.5.9.post1) following the instructions from [ROCm/flash-attention](https://github.com/ROCm/flash-attention/tree/ck_tile#amd-gpurocm-support)
+    Alternatively, wheels intended for vLLM use can be accessed under the releases.

-For example, for ROCm 6.2, suppose your gfx arch is `gfx90a`. To get your gfx architecture, run `rocminfo |grep gfx`.
+    For example, for ROCm 6.2, suppose your gfx arch is `gfx90a`. To get your gfx architecture, run `rocminfo |grep gfx`.

-```console
-$ git clone https://github.com/ROCm/flash-attention.git
-$ cd flash-attention
-$ git checkout 3cea2fb
-$ git submodule update --init
-$ GPU_ARCHS="gfx90a" python3 setup.py install
-$ cd ..
-```
+    ```console
+    git clone https://github.com/ROCm/flash-attention.git
+    cd flash-attention
+    git checkout 3cea2fb
+    git submodule update --init
+    GPU_ARCHS="gfx90a" python3 setup.py install
+    cd ..
+    ```

-```{note}
- You might need to downgrade the "ninja" version to 1.10 it is not used when compiling flash-attention-2 (e.g. `pip install ninja==1.10.2.4`)
-```
+    ```{note}
+    - You might need to downgrade the "ninja" version to 1.10 it is not used when compiling flash-attention-2 (e.g. `pip install ninja==1.10.2.4`)
+    ```

 3. Build vLLM. For example, vLLM on ROCM 6.2 can be built with the following steps:

-```bash
-$ pip install --upgrade pip
+    ```bash
+    $ pip install --upgrade pip

-# Install PyTorch
-$ pip uninstall torch -y
-$ pip install --no-cache-dir --pre torch==2.6.0.dev20241024 --index-url https://download.pytorch.org/whl/nightly/rocm6.2
+    # Install PyTorch
+    $ pip uninstall torch -y
+    $ pip install --no-cache-dir --pre torch==2.6.0.dev20241024 --index-url https://download.pytorch.org/whl/nightly/rocm6.2

-# Build & install AMD SMI
-$ pip install /opt/rocm/share/amd_smi
+    # Build & install AMD SMI
+    $ pip install /opt/rocm/share/amd_smi

-# Install dependencies
-$ pip install --upgrade numba scipy huggingface-hub[cli]
-$ pip install "numpy<2"
-$ pip install -r requirements-rocm.txt
+    # Install dependencies
+    $ pip install --upgrade numba scipy huggingface-hub[cli]
+    $ pip install "numpy<2"
+    $ pip install -r requirements-rocm.txt

-# Build vLLM for MI210/MI250/MI300.
-$ export PYTORCH_ROCM_ARCH="gfx90a;gfx942"
-$ python3 setup.py develop
-```
+    # Build vLLM for MI210/MI250/MI300.
+    $ export PYTORCH_ROCM_ARCH="gfx90a;gfx942"
+    $ python3 setup.py develop
+    ```

-This may take 5-10 minutes. Currently, `pip install .` does not work for ROCm installation.
+    This may take 5-10 minutes. Currently, `pip install .` does not work for ROCm installation.

-```{tip}
- Triton flash attention is used by default. For benchmarking purposes, it is recommended to run a warm up step before collecting perf numbers.
- Triton flash attention does not currently support sliding window attention. If using half precision, please use CK flash-attention for sliding window support.
- To use CK flash-attention or PyTorch naive attention, please use this flag `export VLLM_USE_TRITON_FLASH_ATTN=0` to turn off triton flash attention.
- The ROCm version of PyTorch, ideally, should match the ROCm driver version.
-```
+    ```{tip}
+    - Triton flash attention is used by default. For benchmarking purposes, it is recommended to run a warm up step before collecting perf numbers.
+    - Triton flash attention does not currently support sliding window attention. If using half precision, please use CK flash-attention for sliding window support.
+    - To use CK flash-attention or PyTorch naive attention, please use this flag `export VLLM_USE_TRITON_FLASH_ATTN=0` to turn off triton flash attention.
+    - The ROCm version of PyTorch, ideally, should match the ROCm driver version.
+    ```

-```{tip}
- For MI300x (gfx942) users, to achieve optimal performance, please refer to [MI300x tuning guide](https://rocm.docs.amd.com/en/latest/how-to/tuning-guides/mi300x/index.html) for performance optimization and tuning tips on system and workflow level.
-  For vLLM, please refer to [vLLM performance optimization](https://rocm.docs.amd.com/en/latest/how-to/tuning-guides/mi300x/workload.html#vllm-performance-optimization).
-```
+    ```{tip}
+    - For MI300x (gfx942) users, to achieve optimal performance, please refer to [MI300x tuning guide](https://rocm.docs.amd.com/en/latest/how-to/tuning-guides/mi300x/index.html) for performance optimization and tuning tips on system and workflow level.
+      For vLLM, please refer to [vLLM performance optimization](https://rocm.docs.amd.com/en/latest/how-to/tuning-guides/mi300x/workload.html#vllm-performance-optimization).
+    ```