Stop using title frontmatter and fix doc that can only be reached by search (#20623)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
This commit is contained in:
@@ -1,6 +1,5 @@
|
||||
---
|
||||
title: Anyscale
|
||||
---
|
||||
# Anyscale
|
||||
|
||||
[](){ #deployment-anyscale }
|
||||
|
||||
[Anyscale](https://www.anyscale.com) is a managed, multi-cloud platform developed by the creators of Ray.
|
||||
|
||||
@@ -1,6 +1,4 @@
|
||||
---
|
||||
title: Anything LLM
|
||||
---
|
||||
# Anything LLM
|
||||
|
||||
[Anything LLM](https://github.com/Mintplex-Labs/anything-llm) is a full-stack application that enables you to turn any document, resource, or piece of content into context that any LLM can use as references during chatting.
|
||||
|
||||
|
||||
@@ -1,6 +1,4 @@
|
||||
---
|
||||
title: AutoGen
|
||||
---
|
||||
# AutoGen
|
||||
|
||||
[AutoGen](https://github.com/microsoft/autogen) is a framework for creating multi-agent AI applications that can act autonomously or work alongside humans.
|
||||
|
||||
|
||||
@@ -1,6 +1,4 @@
|
||||
---
|
||||
title: BentoML
|
||||
---
|
||||
# BentoML
|
||||
|
||||
[BentoML](https://github.com/bentoml/BentoML) allows you to deploy a large language model (LLM) server with vLLM as the backend, which exposes OpenAI-compatible endpoints. You can serve the model locally or containerize it as an OCI-compliant image and deploy it on Kubernetes.
|
||||
|
||||
|
||||
@@ -1,6 +1,4 @@
|
||||
---
|
||||
title: Cerebrium
|
||||
---
|
||||
# Cerebrium
|
||||
|
||||
<p align="center">
|
||||
<img src="https://i.ibb.co/hHcScTT/Screenshot-2024-06-13-at-10-14-54.png" alt="vLLM_plus_cerebrium"/>
|
||||
|
||||
@@ -1,6 +1,4 @@
|
||||
---
|
||||
title: Chatbox
|
||||
---
|
||||
# Chatbox
|
||||
|
||||
[Chatbox](https://github.com/chatboxai/chatbox) is a desktop client for LLMs, available on Windows, Mac, Linux.
|
||||
|
||||
|
||||
@@ -1,6 +1,4 @@
|
||||
---
|
||||
title: Dify
|
||||
---
|
||||
# Dify
|
||||
|
||||
[Dify](https://github.com/langgenius/dify) is an open-source LLM app development platform. Its intuitive interface combines agentic AI workflow, RAG pipeline, agent capabilities, model management, observability features, and more, allowing you to quickly move from prototype to production.
|
||||
|
||||
|
||||
@@ -1,6 +1,4 @@
|
||||
---
|
||||
title: dstack
|
||||
---
|
||||
# dstack
|
||||
|
||||
<p align="center">
|
||||
<img src="https://i.ibb.co/71kx6hW/vllm-dstack.png" alt="vLLM_plus_dstack"/>
|
||||
|
||||
@@ -1,6 +1,4 @@
|
||||
---
|
||||
title: Haystack
|
||||
---
|
||||
# Haystack
|
||||
|
||||
# Haystack
|
||||
|
||||
|
||||
@@ -1,6 +1,4 @@
|
||||
---
|
||||
title: Helm
|
||||
---
|
||||
# Helm
|
||||
|
||||
A Helm chart to deploy vLLM for Kubernetes
|
||||
|
||||
|
||||
@@ -1,6 +1,4 @@
|
||||
---
|
||||
title: LiteLLM
|
||||
---
|
||||
# LiteLLM
|
||||
|
||||
[LiteLLM](https://github.com/BerriAI/litellm) call all LLM APIs using the OpenAI format [Bedrock, Huggingface, VertexAI, TogetherAI, Azure, OpenAI, Groq etc.]
|
||||
|
||||
|
||||
@@ -1,6 +1,4 @@
|
||||
---
|
||||
title: Lobe Chat
|
||||
---
|
||||
# Lobe Chat
|
||||
|
||||
[Lobe Chat](https://github.com/lobehub/lobe-chat) is an open-source, modern-design ChatGPT/LLMs UI/Framework.
|
||||
|
||||
|
||||
@@ -1,6 +1,4 @@
|
||||
---
|
||||
title: LWS
|
||||
---
|
||||
# LWS
|
||||
|
||||
LeaderWorkerSet (LWS) is a Kubernetes API that aims to address common deployment patterns of AI/ML inference workloads.
|
||||
A major use case is for multi-host/multi-node distributed inference.
|
||||
|
||||
@@ -1,6 +1,4 @@
|
||||
---
|
||||
title: Modal
|
||||
---
|
||||
# Modal
|
||||
|
||||
vLLM can be run on cloud GPUs with [Modal](https://modal.com), a serverless computing platform designed for fast auto-scaling.
|
||||
|
||||
|
||||
@@ -1,6 +1,4 @@
|
||||
---
|
||||
title: Open WebUI
|
||||
---
|
||||
# Open WebUI
|
||||
|
||||
1. Install the [Docker](https://docs.docker.com/engine/install/)
|
||||
|
||||
|
||||
@@ -1,6 +1,4 @@
|
||||
---
|
||||
title: Retrieval-Augmented Generation
|
||||
---
|
||||
# Retrieval-Augmented Generation
|
||||
|
||||
[Retrieval-augmented generation (RAG)](https://en.wikipedia.org/wiki/Retrieval-augmented_generation) is a technique that enables generative artificial intelligence (Gen AI) models to retrieve and incorporate new information. It modifies interactions with a large language model (LLM) so that the model responds to user queries with reference to a specified set of documents, using this information to supplement information from its pre-existing training data. This allows LLMs to use domain-specific and/or updated information. Use cases include providing chatbot access to internal company data or generating responses based on authoritative sources.
|
||||
|
||||
|
||||
@@ -1,6 +1,4 @@
|
||||
---
|
||||
title: SkyPilot
|
||||
---
|
||||
# SkyPilot
|
||||
|
||||
<p align="center">
|
||||
<img src="https://imgur.com/yxtzPEu.png" alt="vLLM"/>
|
||||
|
||||
@@ -1,6 +1,4 @@
|
||||
---
|
||||
title: Streamlit
|
||||
---
|
||||
# Streamlit
|
||||
|
||||
[Streamlit](https://github.com/streamlit/streamlit) lets you transform Python scripts into interactive web apps in minutes, instead of weeks. Build dashboards, generate reports, or create chat apps.
|
||||
|
||||
|
||||
@@ -1,5 +1,3 @@
|
||||
---
|
||||
title: NVIDIA Triton
|
||||
---
|
||||
# NVIDIA Triton
|
||||
|
||||
The [Triton Inference Server](https://github.com/triton-inference-server) hosts a tutorial demonstrating how to quickly deploy a simple [facebook/opt-125m](https://huggingface.co/facebook/opt-125m) model using vLLM. Please see [Deploying a vLLM model in Triton](https://github.com/triton-inference-server/tutorials/blob/main/Quick_Deploy/vLLM/README.md#deploying-a-vllm-model-in-triton) for more details.
|
||||
|
||||
Reference in New Issue
Block a user