diff --git a/docs/deployment/integrations/aibrix.md b/docs/deployment/integrations/aibrix.md new file mode 100644 index 000000000..db32593cc --- /dev/null +++ b/docs/deployment/integrations/aibrix.md @@ -0,0 +1,5 @@ +# AIBrix + +[AIBrix](https://github.com/vllm-project/aibrix) is a cloud-native control plane that integrates with vLLM to simplify Kubernetes deployment, scaling, routing, and LoRA adapter management for large language model inference. + +For installation and usage instructions, please refer to the [AIBrix documentation](https://aibrix.readthedocs.io/). diff --git a/docs/deployment/integrations/dynamo.md b/docs/deployment/integrations/dynamo.md new file mode 100644 index 000000000..8d0a0dcb0 --- /dev/null +++ b/docs/deployment/integrations/dynamo.md @@ -0,0 +1,7 @@ +# NVIDIA Dynamo + +[NVIDIA Dynamo](https://github.com/ai-dynamo/dynamo) is an open-source framework for distributed LLM inference that can run vLLM on Kubernetes with flexible serving architectures (e.g. aggregated/disaggregated, optional router/planner). + +For Kubernetes deployment instructions and examples (including vLLM), see the [Deploying Dynamo on Kubernetes](https://github.com/ai-dynamo/dynamo/blob/main/docs/kubernetes/README.md) guide. + +Background reading: InfoQ news coverage — [NVIDIA Dynamo simplifies Kubernetes deployment for LLM inference](https://www.infoq.com/news/2025/12/nvidia-dynamo-kubernetes/). diff --git a/docs/deployment/integrations/kubeai.md b/docs/deployment/integrations/kubeai.md index 89d072215..e183d43d0 100644 --- a/docs/deployment/integrations/kubeai.md +++ b/docs/deployment/integrations/kubeai.md @@ -5,6 +5,7 @@ Please see the Installation Guides for environment specific instructions: - [Any Kubernetes Cluster](https://www.kubeai.org/installation/any/) +- [AKS](https://www.kubeai.org/installation/aks/) - [EKS](https://www.kubeai.org/installation/eks/) - [GKE](https://www.kubeai.org/installation/gke/) diff --git a/docs/deployment/k8s.md b/docs/deployment/k8s.md index 3d613d00b..dbcb27727 100644 --- a/docs/deployment/k8s.md +++ b/docs/deployment/k8s.md @@ -11,6 +11,7 @@ Deploying vLLM on Kubernetes is a scalable and efficient way to serve machine le Alternatively, you can deploy vLLM to Kubernetes using any of the following: - [Helm](frameworks/helm.md) +- [NVIDIA Dynamo](integrations/dynamo.md) - [InftyAI/llmaz](integrations/llmaz.md) - [llm-d](integrations/llm-d.md) - [KAITO](integrations/kaito.md) @@ -20,7 +21,7 @@ Alternatively, you can deploy vLLM to Kubernetes using any of the following: - [kubernetes-sigs/lws](frameworks/lws.md) - [meta-llama/llama-stack](integrations/llamastack.md) - [substratusai/kubeai](integrations/kubeai.md) -- [vllm-project/aibrix](https://github.com/vllm-project/aibrix) +- [vllm-project/AIBrix](integrations/aibrix.md) - [vllm-project/production-stack](integrations/production-stack.md) ## Deployment with CPUs diff --git a/pyproject.toml b/pyproject.toml index b786f0d59..b4b9334f8 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -177,6 +177,7 @@ Pn = "Pn" arange = "arange" PARD = "PARD" pard = "pard" +AKS = "AKS" [tool.typos.type.py] extend-glob = []