docs/deployment/frameworks/litellm.md

# LiteLLM

[LiteLLM](https://github.com/BerriAI/litellm) call all LLM APIs using the OpenAI format [Bedrock, Huggingface, VertexAI, TogetherAI, Azure, OpenAI, Groq etc.]

LiteLLM manages:

- Translate inputs to provider's `completion`, `embedding`, and `image_generation` endpoints
- [Consistent output](https://docs.litellm.ai/docs/completion/output), text responses will always be available at `['choices'][0]['message']['content']`
- Retry/fallback logic across multiple deployments (e.g. Azure/OpenAI) - [Router](https://docs.litellm.ai/docs/routing)
- Set Budgets & Rate limits per project, api key, model [LiteLLM Proxy Server (LLM Gateway)](https://docs.litellm.ai/docs/simple_proxy)

And LiteLLM supports all models on VLLM.

## Prerequisites

Set up the vLLM and litellm environment:

```bash
pip install vllm litellm
```

## Deploy

### Chat completion

1. Start the vLLM server with the supported chat completion model, e.g.

    ```bash
    vllm serve qwen/Qwen1.5-0.5B-Chat
    ```

1. Call it with litellm:

??? code

    ```python
    import litellm 

    messages = [{"content": "Hello, how are you?", "role": "user"}]

    # hosted_vllm is prefix key word and necessary
    response = litellm.completion(
        model="hosted_vllm/qwen/Qwen1.5-0.5B-Chat", # pass the vllm model name
        messages=messages,
        api_base="http://{your-vllm-server-host}:{your-vllm-server-port}/v1",
        temperature=0.2,
        max_tokens=80,
    )

    print(response)
    ```

### Embeddings

1. Start the vLLM server with the supported embedding model, e.g.

    ```bash
    vllm serve BAAI/bge-base-en-v1.5
    ```

1. Call it with litellm:

```python
from litellm import embedding   
import os

os.environ["HOSTED_VLLM_API_BASE"] = "http://{your-vllm-server-host}:{your-vllm-server-port}/v1"

# hosted_vllm is prefix key word and necessary
# pass the vllm model name
embedding = embedding(model="hosted_vllm/BAAI/bge-base-en-v1.5", input=["Hello world"])

print(embedding)
```

For details, see the tutorial [Using vLLM in LiteLLM](https://docs.litellm.ai/docs/providers/vllm).
Stop using title frontmatter and fix doc that can only be reached by search (#20623) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> 2025-07-08 11:27:40 +01:00			`# LiteLLM`
[Misc] add litellm integration (#18320) Signed-off-by: reidliu41 <reid201711@gmail.com> Co-authored-by: reidliu41 <reid201711@gmail.com> 2025-05-18 23:32:30 +08:00
			`[LiteLLM](https://github.com/BerriAI/litellm) call all LLM APIs using the OpenAI format [Bedrock, Huggingface, VertexAI, TogetherAI, Azure, OpenAI, Groq etc.]`

			`LiteLLM manages:`

			- Translate inputs to provider's `completion`, `embedding`, and `image_generation` endpoints
			- [Consistent output](https://docs.litellm.ai/docs/completion/output), text responses will always be available at `['choices'][0]['message']['content']`
			`- Retry/fallback logic across multiple deployments (e.g. Azure/OpenAI) - [Router](https://docs.litellm.ai/docs/routing)`
			`- Set Budgets & Rate limits per project, api key, model [LiteLLM Proxy Server (LLM Gateway)](https://docs.litellm.ai/docs/simple_proxy)`

			`And LiteLLM supports all models on VLLM.`

			`## Prerequisites`

[Docs] Use 1-2-3 list for deploy steps in deployment/frameworks/ (#24633) Signed-off-by: windsonsea <haifeng.yao@daocloud.io> 2025-09-11 16:50:12 +08:00			`Set up the vLLM and litellm environment:`
[Misc] add litellm integration (#18320) Signed-off-by: reidliu41 <reid201711@gmail.com> Co-authored-by: reidliu41 <reid201711@gmail.com> 2025-05-18 23:32:30 +08:00
[Docs] Fix syntax highlighting of shell commands (#19870) Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com> 2025-06-23 18:59:09 +01:00			```bash
[Misc] add litellm integration (#18320) Signed-off-by: reidliu41 <reid201711@gmail.com> Co-authored-by: reidliu41 <reid201711@gmail.com> 2025-05-18 23:32:30 +08:00			`pip install vllm litellm`
			```

			`## Deploy`

			`### Chat completion`

[Docs] Use 1-2-3 list for deploy steps in deployment/frameworks/ (#24633) Signed-off-by: windsonsea <haifeng.yao@daocloud.io> 2025-09-11 16:50:12 +08:00			`1. Start the vLLM server with the supported chat completion model, e.g.`
[Misc] add litellm integration (#18320) Signed-off-by: reidliu41 <reid201711@gmail.com> Co-authored-by: reidliu41 <reid201711@gmail.com> 2025-05-18 23:32:30 +08:00
[Docs] Use 1-2-3 list for deploy steps in deployment/frameworks/ (#24633) Signed-off-by: windsonsea <haifeng.yao@daocloud.io> 2025-09-11 16:50:12 +08:00			```bash
			`vllm serve qwen/Qwen1.5-0.5B-Chat`
			```
[Misc] add litellm integration (#18320) Signed-off-by: reidliu41 <reid201711@gmail.com> Co-authored-by: reidliu41 <reid201711@gmail.com> 2025-05-18 23:32:30 +08:00
[Docs] Use 1-2-3 list for deploy steps in deployment/frameworks/ (#24633) Signed-off-by: windsonsea <haifeng.yao@daocloud.io> 2025-09-11 16:50:12 +08:00			`1. Call it with litellm:`
[Misc] add litellm integration (#18320) Signed-off-by: reidliu41 <reid201711@gmail.com> Co-authored-by: reidliu41 <reid201711@gmail.com> 2025-05-18 23:32:30 +08:00
Make distinct `code` and `console` admonitions so readers are less likely to miss them (#20585) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> 2025-07-08 03:55:28 +01:00			`??? code`
[Misc] add litellm integration (#18320) Signed-off-by: reidliu41 <reid201711@gmail.com> Co-authored-by: reidliu41 <reid201711@gmail.com> 2025-05-18 23:32:30 +08:00
[doc] Fold long code blocks to improve readability (#19926) Signed-off-by: reidliu41 <reid201711@gmail.com> Co-authored-by: reidliu41 <reid201711@gmail.com> 2025-06-23 13:24:23 +08:00			```python
			`import litellm`
[Misc] add litellm integration (#18320) Signed-off-by: reidliu41 <reid201711@gmail.com> Co-authored-by: reidliu41 <reid201711@gmail.com> 2025-05-18 23:32:30 +08:00
[Doc] ruff format some Python examples (#26767) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> 2025-10-14 18:21:53 +08:00			`messages = [{"content": "Hello, how are you?", "role": "user"}]`
[doc] Fold long code blocks to improve readability (#19926) Signed-off-by: reidliu41 <reid201711@gmail.com> Co-authored-by: reidliu41 <reid201711@gmail.com> 2025-06-23 13:24:23 +08:00
			`# hosted_vllm is prefix key word and necessary`
			`response = litellm.completion(`
[Doc] ruff format some Python examples (#26767) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> 2025-10-14 18:21:53 +08:00			`model="hosted_vllm/qwen/Qwen1.5-0.5B-Chat", # pass the vllm model name`
			`messages=messages,`
			`api_base="http://{your-vllm-server-host}:{your-vllm-server-port}/v1",`
			`temperature=0.2,`
			`max_tokens=80,`
			`)`
[doc] Fold long code blocks to improve readability (#19926) Signed-off-by: reidliu41 <reid201711@gmail.com> Co-authored-by: reidliu41 <reid201711@gmail.com> 2025-06-23 13:24:23 +08:00
			`print(response)`
			```
[Misc] add litellm integration (#18320) Signed-off-by: reidliu41 <reid201711@gmail.com> Co-authored-by: reidliu41 <reid201711@gmail.com> 2025-05-18 23:32:30 +08:00
			`### Embeddings`

[Docs] Use 1-2-3 list for deploy steps in deployment/frameworks/ (#24633) Signed-off-by: windsonsea <haifeng.yao@daocloud.io> 2025-09-11 16:50:12 +08:00			`1. Start the vLLM server with the supported embedding model, e.g.`
[Misc] add litellm integration (#18320) Signed-off-by: reidliu41 <reid201711@gmail.com> Co-authored-by: reidliu41 <reid201711@gmail.com> 2025-05-18 23:32:30 +08:00
[Docs] Use 1-2-3 list for deploy steps in deployment/frameworks/ (#24633) Signed-off-by: windsonsea <haifeng.yao@daocloud.io> 2025-09-11 16:50:12 +08:00			```bash
			`vllm serve BAAI/bge-base-en-v1.5`
			```
[Misc] add litellm integration (#18320) Signed-off-by: reidliu41 <reid201711@gmail.com> Co-authored-by: reidliu41 <reid201711@gmail.com> 2025-05-18 23:32:30 +08:00
[Docs] Use 1-2-3 list for deploy steps in deployment/frameworks/ (#24633) Signed-off-by: windsonsea <haifeng.yao@daocloud.io> 2025-09-11 16:50:12 +08:00			`1. Call it with litellm:`
[Misc] add litellm integration (#18320) Signed-off-by: reidliu41 <reid201711@gmail.com> Co-authored-by: reidliu41 <reid201711@gmail.com> 2025-05-18 23:32:30 +08:00
			```python
			`from litellm import embedding`
			`import os`

			`os.environ["HOSTED_VLLM_API_BASE"] = "http://{your-vllm-server-host}:{your-vllm-server-port}/v1"`

			`# hosted_vllm is prefix key word and necessary`
			`# pass the vllm model name`
			`embedding = embedding(model="hosted_vllm/BAAI/bge-base-en-v1.5", input=["Hello world"])`

			`print(embedding)`
			```

			`For details, see the tutorial [Using vLLM in LiteLLM](https://docs.litellm.ai/docs/providers/vllm).`