[Docs] Reduce custom syntax used in docs (#27009)

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
This commit is contained in:
Harry Mellor
2025-10-17 04:05:34 +01:00
committed by GitHub
parent 965c5f4914
commit 4ffd6e8942
65 changed files with 381 additions and 402 deletions

View File

@@ -37,7 +37,7 @@ You can add any other [engine-args](../configuration/engine_args.md) you need af
memory to share data between processes under the hood, particularly for tensor parallel inference.
!!! note
Optional dependencies are not included in order to avoid licensing issues (e.g. <gh-issue:8030>).
Optional dependencies are not included in order to avoid licensing issues (e.g. <https://github.com/vllm-project/vllm/issues/8030>).
If you need to use those dependencies (having accepted the license terms),
create a custom Dockerfile on top of the base image with an extra layer that installs them:
@@ -66,7 +66,7 @@ You can add any other [engine-args](../configuration/engine_args.md) you need af
## Building vLLM's Docker Image from Source
You can build and run vLLM from source via the provided <gh-file:docker/Dockerfile>. To build vLLM:
You can build and run vLLM from source via the provided [docker/Dockerfile](../../docker/Dockerfile). To build vLLM:
```bash
# optionally specifies: --build-arg max_jobs=8 --build-arg nvcc_threads=2

View File

@@ -5,7 +5,7 @@
[Anyscale](https://www.anyscale.com) is a managed, multi-cloud platform developed by the creators of Ray.
Anyscale automates the entire lifecycle of Ray clusters in your AWS, GCP, or Azure account, delivering the flexibility of open-source Ray
without the operational overhead of maintaining Kubernetes control planes, configuring autoscalers, managing observability stacks, or manually managing head and worker nodes with helper scripts like <gh-file:examples/online_serving/run_cluster.sh>.
without the operational overhead of maintaining Kubernetes control planes, configuring autoscalers, managing observability stacks, or manually managing head and worker nodes with helper scripts like [examples/online_serving/run_cluster.sh](../../../examples/online_serving/run_cluster.sh).
When serving large language models with vLLM, Anyscale can rapidly provision [production-ready HTTPS endpoints](https://docs.anyscale.com/examples/deploy-ray-serve-llms) or [fault-tolerant batch inference jobs](https://docs.anyscale.com/examples/ray-data-llm).

View File

@@ -36,7 +36,7 @@ pip install -U vllm \
vllm serve qwen/Qwen1.5-0.5B-Chat --port 8001
```
1. Use the script: <gh-file:examples/online_serving/retrieval_augmented_generation_with_langchain.py>
1. Use the script: [examples/online_serving/retrieval_augmented_generation_with_langchain.py](../../../examples/online_serving/retrieval_augmented_generation_with_langchain.py)
1. Run the script
@@ -74,7 +74,7 @@ pip install vllm \
vllm serve qwen/Qwen1.5-0.5B-Chat --port 8001
```
1. Use the script: <gh-file:examples/online_serving/retrieval_augmented_generation_with_llamaindex.py>
1. Use the script: [examples/online_serving/retrieval_augmented_generation_with_llamaindex.py](../../../examples/online_serving/retrieval_augmented_generation_with_llamaindex.py)
1. Run the script:

View File

@@ -20,7 +20,7 @@ pip install vllm streamlit openai
vllm serve Qwen/Qwen1.5-0.5B-Chat
```
1. Use the script: <gh-file:examples/online_serving/streamlit_openai_chatbot_webserver.py>
1. Use the script: [examples/online_serving/streamlit_openai_chatbot_webserver.py](../../../examples/online_serving/streamlit_openai_chatbot_webserver.py)
1. Start the streamlit web UI and start to chat: