docs/deployment/integrations/llamastack.md

# Llama Stack

vLLM is also available via [Llama Stack](https://github.com/llamastack/llama-stack).

To install Llama Stack, run

```bash
pip install llama-stack -q
```

## Inference using OpenAI-Compatible API

Then start the Llama Stack server and configure it to point to your vLLM server with the following settings:

```yaml
inference:
  - provider_id: vllm0
    provider_type: remote::vllm
    config:
      url: http://127.0.0.1:8000
```

Please refer to [this guide](https://llama-stack.readthedocs.io/en/latest/providers/inference/remote_vllm.html) for more details on this remote vLLM provider.

## Inference using Embedded vLLM

An [inline provider](https://github.com/llamastack/llama-stack/tree/main/llama_stack/providers/inline/inference)
is also available. This is a sample of configuration using that method:

```yaml
inference:
  - provider_type: vllm
    config:
      model: Llama3.1-8B-Instruct
      tensor_parallel_size: 4
```
Stop using title frontmatter and fix doc that can only be reached by search (#20623) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> 2025-07-08 11:27:40 +01:00			`# Llama Stack`
[Docs] Convert rST to MyST (Markdown) (#11145) Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com> 2024-12-23 17:35:38 -05:00
[Doc] Fix issues in integrations/llamastack.md (#24428) Signed-off-by: windsonsea <haifeng.yao@daocloud.io> 2025-09-08 17:28:32 +08:00			`vLLM is also available via [Llama Stack](https://github.com/llamastack/llama-stack).`
[Docs] Convert rST to MyST (Markdown) (#11145) Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com> 2024-12-23 17:35:38 -05:00
			`To install Llama Stack, run`

[Docs] Fix syntax highlighting of shell commands (#19870) Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com> 2025-06-23 18:59:09 +01:00			```bash
[CI/Build] Add markdown linter (#11857) Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com> 2025-01-12 03:17:13 -05:00			`pip install llama-stack -q`
[Docs] Convert rST to MyST (Markdown) (#11145) Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com> 2024-12-23 17:35:38 -05:00			```

[Doc] Fix issues in integrations/llamastack.md (#24428) Signed-off-by: windsonsea <haifeng.yao@daocloud.io> 2025-09-08 17:28:32 +08:00			`## Inference using OpenAI-Compatible API`
[Docs] Convert rST to MyST (Markdown) (#11145) Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com> 2024-12-23 17:35:38 -05:00
[Doc] Fix issues in integrations/llamastack.md (#24428) Signed-off-by: windsonsea <haifeng.yao@daocloud.io> 2025-09-08 17:28:32 +08:00			`Then start the Llama Stack server and configure it to point to your vLLM server with the following settings:`
[Docs] Convert rST to MyST (Markdown) (#11145) Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com> 2024-12-23 17:35:38 -05:00
			```yaml
			`inference:`
			`- provider_id: vllm0`
			`provider_type: remote::vllm`
			`config:`
			`url: http://127.0.0.1:8000`
			```

[Doc] Fix issues in integrations/llamastack.md (#24428) Signed-off-by: windsonsea <haifeng.yao@daocloud.io> 2025-09-08 17:28:32 +08:00			`Please refer to [this guide](https://llama-stack.readthedocs.io/en/latest/providers/inference/remote_vllm.html) for more details on this remote vLLM provider.`
[Docs] Convert rST to MyST (Markdown) (#11145) Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com> 2024-12-23 17:35:38 -05:00
[Doc] Fix issues in integrations/llamastack.md (#24428) Signed-off-by: windsonsea <haifeng.yao@daocloud.io> 2025-09-08 17:28:32 +08:00			`## Inference using Embedded vLLM`
[Docs] Convert rST to MyST (Markdown) (#11145) Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com> 2024-12-23 17:35:38 -05:00
[Doc] Fix issues in integrations/llamastack.md (#24428) Signed-off-by: windsonsea <haifeng.yao@daocloud.io> 2025-09-08 17:28:32 +08:00			`An [inline provider](https://github.com/llamastack/llama-stack/tree/main/llama_stack/providers/inline/inference)`
[Docs] Convert rST to MyST (Markdown) (#11145) Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com> 2024-12-23 17:35:38 -05:00			`is also available. This is a sample of configuration using that method:`

			```yaml
[Doc] Fix issues in integrations/llamastack.md (#24428) Signed-off-by: windsonsea <haifeng.yao@daocloud.io> 2025-09-08 17:28:32 +08:00			`inference:`
[Docs] Convert rST to MyST (Markdown) (#11145) Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com> 2024-12-23 17:35:38 -05:00			`- provider_type: vllm`
			`config:`
			`model: Llama3.1-8B-Instruct`
			`tensor_parallel_size: 4`
			```