[Docs] Add Ray Serve LLM section to openai compatible server guide (#20595)
Signed-off-by: Ricardo Decal <rdecal@anyscale.com>
This commit is contained in:
@@ -775,3 +775,17 @@ The following extra parameters are supported:
|
|||||||
```python
|
```python
|
||||||
--8<-- "vllm/entrypoints/openai/protocol.py:rerank-extra-params"
|
--8<-- "vllm/entrypoints/openai/protocol.py:rerank-extra-params"
|
||||||
```
|
```
|
||||||
|
|
||||||
|
## Ray Serve LLM
|
||||||
|
|
||||||
|
Ray Serve LLM enables scalable, production-grade serving of the vLLM engine. It integrates tightly with vLLM and extends it with features such as auto-scaling, load balancing, and back-pressure.
|
||||||
|
|
||||||
|
Key capabilities:
|
||||||
|
|
||||||
|
- Exposes an OpenAI-compatible HTTP API as well as a Pythonic API.
|
||||||
|
- Scales from a single GPU to a multi-node cluster without code changes.
|
||||||
|
- Provides observability and autoscaling policies through Ray dashboards and metrics.
|
||||||
|
|
||||||
|
The following example shows how to deploy a large model like DeepSeek R1 with Ray Serve LLM: <gh-file:examples/online_serving/ray_serve_deepseek.py>.
|
||||||
|
|
||||||
|
Learn more about Ray Serve LLM with the official [Ray Serve LLM documentation](https://docs.ray.io/en/latest/serve/llm/serving-llms.html).
|
||||||
|
|||||||
Reference in New Issue
Block a user