[Doc] Convert docs to use colon fences (#12471)

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
This commit is contained in:
Harry Mellor
2025-01-29 03:38:29 +00:00
committed by GitHub
parent a7e3eba66f
commit dd6a3a02cb
68 changed files with 2352 additions and 2341 deletions

View File

@@ -4,19 +4,19 @@
This document provides an overview of the vLLM architecture.
```{contents} Table of Contents
:::{contents} Table of Contents
:depth: 2
:local: true
```
:::
## Entrypoints
vLLM provides a number of entrypoints for interacting with the system. The
following diagram shows the relationship between them.
```{image} /assets/design/arch_overview/entrypoints.excalidraw.png
:::{image} /assets/design/arch_overview/entrypoints.excalidraw.png
:alt: Entrypoints Diagram
```
:::
### LLM Class
@@ -84,9 +84,9 @@ More details on the API server can be found in the [OpenAI-Compatible Server](#o
The `LLMEngine` and `AsyncLLMEngine` classes are central to the functioning of
the vLLM system, handling model inference and asynchronous request processing.
```{image} /assets/design/arch_overview/llm_engine.excalidraw.png
:::{image} /assets/design/arch_overview/llm_engine.excalidraw.png
:alt: LLMEngine Diagram
```
:::
### LLMEngine
@@ -144,11 +144,11 @@ configurations affect the class we ultimately get.
The following figure shows the class hierarchy of vLLM:
> ```{figure} /assets/design/hierarchy.png
> :::{figure} /assets/design/hierarchy.png
> :align: center
> :alt: query
> :width: 100%
> ```
> :::
There are several important design choices behind this class hierarchy:
@@ -178,7 +178,7 @@ of a vision model and a language model. By making the constructor uniform, we
can easily create a vision model and a language model and compose them into a
vision-language model.
````{note}
:::{note}
To support this change, all vLLM models' signatures have been updated to:
```python
@@ -215,7 +215,7 @@ else:
```
This way, the model can work with both old and new versions of vLLM.
````
:::
3\. **Sharding and Quantization at Initialization**: Certain features require
changing the model weights. For example, tensor parallelism needs to shard the