[Docs] Reduce custom syntax used in docs (#27009)

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-10-17 04:05:34 +01:00
parent 965c5f4914
commit 4ffd6e8942
65 changed files with 381 additions and 402 deletions
--- a/docs/deployment/docker.md
+++ b/docs/deployment/docker.md
@@ -37,7 +37,7 @@ You can add any other [engine-args](../configuration/engine_args.md) you need af
    memory to share data between processes under the hood, particularly for tensor parallel inference.

 !!! note
-    Optional dependencies are not included in order to avoid licensing issues (e.g. <gh-issue:8030>).
+    Optional dependencies are not included in order to avoid licensing issues (e.g. <https://github.com/vllm-project/vllm/issues/8030>).

    If you need to use those dependencies (having accepted the license terms),
    create a custom Dockerfile on top of the base image with an extra layer that installs them:
@@ -66,7 +66,7 @@ You can add any other [engine-args](../configuration/engine_args.md) you need af

 ## Building vLLM's Docker Image from Source

-You can build and run vLLM from source via the provided <gh-file:docker/Dockerfile>. To build vLLM:
+You can build and run vLLM from source via the provided [docker/Dockerfile](../../docker/Dockerfile). To build vLLM:

 ```bash
 # optionally specifies: --build-arg max_jobs=8 --build-arg nvcc_threads=2
--- a/docs/deployment/frameworks/anyscale.md
+++ b/docs/deployment/frameworks/anyscale.md
@@ -5,7 +5,7 @@
 [Anyscale](https://www.anyscale.com) is a managed, multi-cloud platform developed by the creators of Ray.

 Anyscale automates the entire lifecycle of Ray clusters in your AWS, GCP, or Azure account, delivering the flexibility of open-source Ray
-without the operational overhead of maintaining Kubernetes control planes, configuring autoscalers, managing observability stacks, or manually managing head and worker nodes with helper scripts like <gh-file:examples/online_serving/run_cluster.sh>.
+without the operational overhead of maintaining Kubernetes control planes, configuring autoscalers, managing observability stacks, or manually managing head and worker nodes with helper scripts like [examples/online_serving/run_cluster.sh](../../../examples/online_serving/run_cluster.sh).

 When serving large language models with vLLM, Anyscale can rapidly provision [production-ready HTTPS endpoints](https://docs.anyscale.com/examples/deploy-ray-serve-llms) or [fault-tolerant batch inference jobs](https://docs.anyscale.com/examples/ray-data-llm).

--- a/docs/deployment/frameworks/retrieval_augmented_generation.md
+++ b/docs/deployment/frameworks/retrieval_augmented_generation.md
@@ -36,7 +36,7 @@ pip install -U vllm \
    vllm serve qwen/Qwen1.5-0.5B-Chat --port 8001
    ```

-1. Use the script: <gh-file:examples/online_serving/retrieval_augmented_generation_with_langchain.py>
+1. Use the script: [examples/online_serving/retrieval_augmented_generation_with_langchain.py](../../../examples/online_serving/retrieval_augmented_generation_with_langchain.py)

 1. Run the script

@@ -74,7 +74,7 @@ pip install vllm \
    vllm serve qwen/Qwen1.5-0.5B-Chat --port 8001
    ```

-1. Use the script: <gh-file:examples/online_serving/retrieval_augmented_generation_with_llamaindex.py>
+1. Use the script: [examples/online_serving/retrieval_augmented_generation_with_llamaindex.py](../../../examples/online_serving/retrieval_augmented_generation_with_llamaindex.py)

 1. Run the script:

--- a/docs/deployment/frameworks/streamlit.md
+++ b/docs/deployment/frameworks/streamlit.md
@@ -20,7 +20,7 @@ pip install vllm streamlit openai
    vllm serve Qwen/Qwen1.5-0.5B-Chat
    ```

-1. Use the script: <gh-file:examples/online_serving/streamlit_openai_chatbot_webserver.py>
+1. Use the script: [examples/online_serving/streamlit_openai_chatbot_webserver.py](../../../examples/online_serving/streamlit_openai_chatbot_webserver.py)

 1. Start the streamlit web UI and start to chat: