[Dependency] Remove default ray dependency (#36170)

Signed-off-by: yewentao256 <zhyanwentao@126.com> Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-03-08 23:06:22 -04:00
parent a0f44bb616
commit 384425f84e
3 changed files with 6 additions and 2 deletions
--- a/docs/serving/parallelism_scaling.md
+++ b/docs/serving/parallelism_scaling.md
@@ -68,6 +68,12 @@ vLLM uses Ray to manage the distributed execution of tasks across multiple nodes

 Ray also offers high-level APIs for large-scale [offline batch inference](https://docs.ray.io/en/latest/data/working-with-llms.html) and [online serving](https://docs.ray.io/en/latest/serve/llm) that can leverage vLLM as the engine. These APIs add production-grade fault tolerance, scaling, and distributed observability to vLLM workloads.

+Ray is an optional dependency. Install it explicitly before using Ray-based execution, for example:
+
+```bash
+pip install "ray[cgraph]"
+```
+
 For details, see the [Ray documentation](https://docs.ray.io/en/latest/index.html).

 ### Ray cluster setup with containers
--- a/requirements/cuda.txt
+++ b/requirements/cuda.txt
@@ -4,7 +4,6 @@
 numba == 0.61.2 # Required for N-gram speculative decoding

 # Dependencies for NVIDIA GPUs
-ray[cgraph]>=2.48.0
 torch==2.10.0
 torchaudio==2.10.0
 # These must be updated alongside torch
--- a/requirements/rocm.txt
+++ b/requirements/rocm.txt
@@ -10,7 +10,6 @@ numba == 0.61.2 # Required for N-gram speculative decoding

 # Dependencies for AMD GPUs
 datasets
-ray[cgraph]>=2.48.0
 peft
 pytest-asyncio
 tensorizer==2.10.1