[Doc] Convert docs to use colon fences (#12471)

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
This commit is contained in:
Harry Mellor
2025-01-29 03:38:29 +00:00
committed by GitHub
parent a7e3eba66f
commit dd6a3a02cb
68 changed files with 2352 additions and 2341 deletions

View File

@@ -10,9 +10,9 @@ First, clone the PyTorch model code from the source repository.
For instance, vLLM's [OPT model](gh-file:vllm/model_executor/models/opt.py) was adapted from
HuggingFace's [modeling_opt.py](https://github.com/huggingface/transformers/blob/main/src/transformers/models/opt/modeling_opt.py) file.
```{warning}
:::{warning}
Make sure to review and adhere to the original code's copyright and licensing terms!
```
:::
## 2. Make your code compatible with vLLM
@@ -80,10 +80,10 @@ def forward(
...
```
```{note}
:::{note}
Currently, vLLM supports the basic multi-head attention mechanism and its variant with rotary positional embeddings.
If your model employs a different attention mechanism, you will need to implement a new attention layer in vLLM.
```
:::
For reference, check out our [Llama implementation](gh-file:vllm/model_executor/models/llama.py). vLLM already supports a large number of models. It is recommended to find a model similar to yours and adapt it to your model's architecture. Check out <gh-dir:vllm/model_executor/models> for more examples.

View File

@@ -4,7 +4,7 @@
This section provides more information on how to integrate a [PyTorch](https://pytorch.org/) model into vLLM.
```{toctree}
:::{toctree}
:caption: Contents
:maxdepth: 1
@@ -12,16 +12,16 @@ basic
registration
tests
multimodal
```
:::
```{note}
:::{note}
The complexity of adding a new model depends heavily on the model's architecture.
The process is considerably straightforward if the model shares a similar architecture with an existing model in vLLM.
However, for models that include new operators (e.g., a new attention mechanism), the process can be a bit more complex.
```
:::
```{tip}
:::{tip}
If you are encountering issues while integrating your model into vLLM, feel free to open a [GitHub issue](https://github.com/vllm-project/vllm/issues)
or ask on our [developer slack](https://slack.vllm.ai).
We will be happy to help you out!
```
:::

View File

@@ -48,9 +48,9 @@ Further update the model as follows:
return vision_embeddings
```
```{important}
:::{important}
The returned `multimodal_embeddings` must be either a **3D {class}`torch.Tensor`** of shape `(num_items, feature_size, hidden_size)`, or a **list / tuple of 2D {class}`torch.Tensor`'s** of shape `(feature_size, hidden_size)`, so that `multimodal_embeddings[i]` retrieves the embeddings generated from the `i`-th multimodal data item (e.g, image) of the request.
```
:::
- Implement {meth}`~vllm.model_executor.models.interfaces.SupportsMultiModal.get_input_embeddings` to merge `multimodal_embeddings` with text embeddings from the `input_ids`. If input processing for the model is implemented correctly (see sections below), then you can leverage the utility function we provide to easily merge the embeddings.
@@ -89,10 +89,10 @@ Further update the model as follows:
+ class YourModelForImage2Seq(nn.Module, SupportsMultiModal):
```
```{note}
:::{note}
The model class does not have to be named {code}`*ForCausalLM`.
Check out [the HuggingFace Transformers documentation](https://huggingface.co/docs/transformers/model_doc/auto#multimodal) for some examples.
```
:::
## 2. Specify processing information
@@ -120,8 +120,8 @@ When calling the model, the output embeddings from the visual encoder are assign
containing placeholder feature tokens. Therefore, the number of placeholder feature tokens should be equal
to the size of the output embeddings.
::::{tab-set}
:::{tab-item} Basic example: LLaVA
:::::{tab-set}
::::{tab-item} Basic example: LLaVA
:sync: llava
Looking at the code of HF's `LlavaForConditionalGeneration`:
@@ -254,12 +254,12 @@ def get_mm_max_tokens_per_item(self, seq_len: int) -> Mapping[str, int]:
return {"image": self.get_max_image_tokens()}
```
```{note}
:::{note}
Our [actual code](gh-file:vllm/model_executor/models/llava.py) is more abstracted to support vision encoders other than CLIP.
```
:::
::::
:::::
## 3. Specify dummy inputs
@@ -315,17 +315,17 @@ def get_dummy_processor_inputs(
Afterwards, create a subclass of {class}`~vllm.multimodal.processing.BaseMultiModalProcessor`
to fill in the missing details about HF processing.
```{seealso}
:::{seealso}
[Multi-Modal Data Processing](#mm-processing)
```
:::
### Multi-modal fields
Override {class}`~vllm.multimodal.processing.BaseMultiModalProcessor._get_mm_fields_config` to
return a schema of the tensors outputted by the HF processor that are related to the input multi-modal items.
::::{tab-set}
:::{tab-item} Basic example: LLaVA
:::::{tab-set}
::::{tab-item} Basic example: LLaVA
:sync: llava
Looking at the model's `forward` method:
@@ -367,13 +367,13 @@ def _get_mm_fields_config(
)
```
```{note}
:::{note}
Our [actual code](gh-file:vllm/model_executor/models/llava.py) additionally supports
pre-computed image embeddings, which can be passed to be model via the `image_embeds` argument.
```
:::
::::
:::::
### Prompt replacements

View File

@@ -17,17 +17,17 @@ After you have implemented your model (see [tutorial](#new-model-basic)), put it
Then, add your model class to `_VLLM_MODELS` in <gh-file:vllm/model_executor/models/registry.py> so that it is automatically registered upon importing vLLM.
Finally, update our [list of supported models](#supported-models) to promote your model!
```{important}
:::{important}
The list of models in each section should be maintained in alphabetical order.
```
:::
## Out-of-tree models
You can load an external model using a plugin without modifying the vLLM codebase.
```{seealso}
:::{seealso}
[vLLM's Plugin System](#plugin-system)
```
:::
To register the model, use the following code:
@@ -45,11 +45,11 @@ from vllm import ModelRegistry
ModelRegistry.register_model("YourModelForCausalLM", "your_code:YourModelForCausalLM")
```
```{important}
:::{important}
If your model is a multimodal model, ensure the model class implements the {class}`~vllm.model_executor.models.interfaces.SupportsMultiModal` interface.
Read more about that [here](#supports-multimodal).
```
:::
```{note}
:::{note}
Although you can directly put these code snippets in your script using `vllm.LLM`, the recommended way is to place these snippets in a vLLM plugin. This ensures compatibility with various vLLM features like distributed inference and the API server.
```
:::

View File

@@ -14,14 +14,14 @@ Without them, the CI for your PR will fail.
Include an example HuggingFace repository for your model in <gh-file:tests/models/registry.py>.
This enables a unit test that loads dummy weights to ensure that the model can be initialized in vLLM.
```{important}
:::{important}
The list of models in each section should be maintained in alphabetical order.
```
:::
```{tip}
:::{tip}
If your model requires a development version of HF Transformers, you can set
`min_transformers_version` to skip the test in CI until the model is released.
```
:::
## Optional Tests