[Doc] Convert docs to use colon fences (#12471)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
This commit is contained in:
@@ -10,9 +10,9 @@ First, clone the PyTorch model code from the source repository.
|
||||
For instance, vLLM's [OPT model](gh-file:vllm/model_executor/models/opt.py) was adapted from
|
||||
HuggingFace's [modeling_opt.py](https://github.com/huggingface/transformers/blob/main/src/transformers/models/opt/modeling_opt.py) file.
|
||||
|
||||
```{warning}
|
||||
:::{warning}
|
||||
Make sure to review and adhere to the original code's copyright and licensing terms!
|
||||
```
|
||||
:::
|
||||
|
||||
## 2. Make your code compatible with vLLM
|
||||
|
||||
@@ -80,10 +80,10 @@ def forward(
|
||||
...
|
||||
```
|
||||
|
||||
```{note}
|
||||
:::{note}
|
||||
Currently, vLLM supports the basic multi-head attention mechanism and its variant with rotary positional embeddings.
|
||||
If your model employs a different attention mechanism, you will need to implement a new attention layer in vLLM.
|
||||
```
|
||||
:::
|
||||
|
||||
For reference, check out our [Llama implementation](gh-file:vllm/model_executor/models/llama.py). vLLM already supports a large number of models. It is recommended to find a model similar to yours and adapt it to your model's architecture. Check out <gh-dir:vllm/model_executor/models> for more examples.
|
||||
|
||||
|
||||
@@ -4,7 +4,7 @@
|
||||
|
||||
This section provides more information on how to integrate a [PyTorch](https://pytorch.org/) model into vLLM.
|
||||
|
||||
```{toctree}
|
||||
:::{toctree}
|
||||
:caption: Contents
|
||||
:maxdepth: 1
|
||||
|
||||
@@ -12,16 +12,16 @@ basic
|
||||
registration
|
||||
tests
|
||||
multimodal
|
||||
```
|
||||
:::
|
||||
|
||||
```{note}
|
||||
:::{note}
|
||||
The complexity of adding a new model depends heavily on the model's architecture.
|
||||
The process is considerably straightforward if the model shares a similar architecture with an existing model in vLLM.
|
||||
However, for models that include new operators (e.g., a new attention mechanism), the process can be a bit more complex.
|
||||
```
|
||||
:::
|
||||
|
||||
```{tip}
|
||||
:::{tip}
|
||||
If you are encountering issues while integrating your model into vLLM, feel free to open a [GitHub issue](https://github.com/vllm-project/vllm/issues)
|
||||
or ask on our [developer slack](https://slack.vllm.ai).
|
||||
We will be happy to help you out!
|
||||
```
|
||||
:::
|
||||
|
||||
@@ -48,9 +48,9 @@ Further update the model as follows:
|
||||
return vision_embeddings
|
||||
```
|
||||
|
||||
```{important}
|
||||
:::{important}
|
||||
The returned `multimodal_embeddings` must be either a **3D {class}`torch.Tensor`** of shape `(num_items, feature_size, hidden_size)`, or a **list / tuple of 2D {class}`torch.Tensor`'s** of shape `(feature_size, hidden_size)`, so that `multimodal_embeddings[i]` retrieves the embeddings generated from the `i`-th multimodal data item (e.g, image) of the request.
|
||||
```
|
||||
:::
|
||||
|
||||
- Implement {meth}`~vllm.model_executor.models.interfaces.SupportsMultiModal.get_input_embeddings` to merge `multimodal_embeddings` with text embeddings from the `input_ids`. If input processing for the model is implemented correctly (see sections below), then you can leverage the utility function we provide to easily merge the embeddings.
|
||||
|
||||
@@ -89,10 +89,10 @@ Further update the model as follows:
|
||||
+ class YourModelForImage2Seq(nn.Module, SupportsMultiModal):
|
||||
```
|
||||
|
||||
```{note}
|
||||
:::{note}
|
||||
The model class does not have to be named {code}`*ForCausalLM`.
|
||||
Check out [the HuggingFace Transformers documentation](https://huggingface.co/docs/transformers/model_doc/auto#multimodal) for some examples.
|
||||
```
|
||||
:::
|
||||
|
||||
## 2. Specify processing information
|
||||
|
||||
@@ -120,8 +120,8 @@ When calling the model, the output embeddings from the visual encoder are assign
|
||||
containing placeholder feature tokens. Therefore, the number of placeholder feature tokens should be equal
|
||||
to the size of the output embeddings.
|
||||
|
||||
::::{tab-set}
|
||||
:::{tab-item} Basic example: LLaVA
|
||||
:::::{tab-set}
|
||||
::::{tab-item} Basic example: LLaVA
|
||||
:sync: llava
|
||||
|
||||
Looking at the code of HF's `LlavaForConditionalGeneration`:
|
||||
@@ -254,12 +254,12 @@ def get_mm_max_tokens_per_item(self, seq_len: int) -> Mapping[str, int]:
|
||||
return {"image": self.get_max_image_tokens()}
|
||||
```
|
||||
|
||||
```{note}
|
||||
:::{note}
|
||||
Our [actual code](gh-file:vllm/model_executor/models/llava.py) is more abstracted to support vision encoders other than CLIP.
|
||||
```
|
||||
|
||||
:::
|
||||
|
||||
::::
|
||||
:::::
|
||||
|
||||
## 3. Specify dummy inputs
|
||||
|
||||
@@ -315,17 +315,17 @@ def get_dummy_processor_inputs(
|
||||
Afterwards, create a subclass of {class}`~vllm.multimodal.processing.BaseMultiModalProcessor`
|
||||
to fill in the missing details about HF processing.
|
||||
|
||||
```{seealso}
|
||||
:::{seealso}
|
||||
[Multi-Modal Data Processing](#mm-processing)
|
||||
```
|
||||
:::
|
||||
|
||||
### Multi-modal fields
|
||||
|
||||
Override {class}`~vllm.multimodal.processing.BaseMultiModalProcessor._get_mm_fields_config` to
|
||||
return a schema of the tensors outputted by the HF processor that are related to the input multi-modal items.
|
||||
|
||||
::::{tab-set}
|
||||
:::{tab-item} Basic example: LLaVA
|
||||
:::::{tab-set}
|
||||
::::{tab-item} Basic example: LLaVA
|
||||
:sync: llava
|
||||
|
||||
Looking at the model's `forward` method:
|
||||
@@ -367,13 +367,13 @@ def _get_mm_fields_config(
|
||||
)
|
||||
```
|
||||
|
||||
```{note}
|
||||
:::{note}
|
||||
Our [actual code](gh-file:vllm/model_executor/models/llava.py) additionally supports
|
||||
pre-computed image embeddings, which can be passed to be model via the `image_embeds` argument.
|
||||
```
|
||||
|
||||
:::
|
||||
|
||||
::::
|
||||
:::::
|
||||
|
||||
### Prompt replacements
|
||||
|
||||
|
||||
@@ -17,17 +17,17 @@ After you have implemented your model (see [tutorial](#new-model-basic)), put it
|
||||
Then, add your model class to `_VLLM_MODELS` in <gh-file:vllm/model_executor/models/registry.py> so that it is automatically registered upon importing vLLM.
|
||||
Finally, update our [list of supported models](#supported-models) to promote your model!
|
||||
|
||||
```{important}
|
||||
:::{important}
|
||||
The list of models in each section should be maintained in alphabetical order.
|
||||
```
|
||||
:::
|
||||
|
||||
## Out-of-tree models
|
||||
|
||||
You can load an external model using a plugin without modifying the vLLM codebase.
|
||||
|
||||
```{seealso}
|
||||
:::{seealso}
|
||||
[vLLM's Plugin System](#plugin-system)
|
||||
```
|
||||
:::
|
||||
|
||||
To register the model, use the following code:
|
||||
|
||||
@@ -45,11 +45,11 @@ from vllm import ModelRegistry
|
||||
ModelRegistry.register_model("YourModelForCausalLM", "your_code:YourModelForCausalLM")
|
||||
```
|
||||
|
||||
```{important}
|
||||
:::{important}
|
||||
If your model is a multimodal model, ensure the model class implements the {class}`~vllm.model_executor.models.interfaces.SupportsMultiModal` interface.
|
||||
Read more about that [here](#supports-multimodal).
|
||||
```
|
||||
:::
|
||||
|
||||
```{note}
|
||||
:::{note}
|
||||
Although you can directly put these code snippets in your script using `vllm.LLM`, the recommended way is to place these snippets in a vLLM plugin. This ensures compatibility with various vLLM features like distributed inference and the API server.
|
||||
```
|
||||
:::
|
||||
|
||||
@@ -14,14 +14,14 @@ Without them, the CI for your PR will fail.
|
||||
Include an example HuggingFace repository for your model in <gh-file:tests/models/registry.py>.
|
||||
This enables a unit test that loads dummy weights to ensure that the model can be initialized in vLLM.
|
||||
|
||||
```{important}
|
||||
:::{important}
|
||||
The list of models in each section should be maintained in alphabetical order.
|
||||
```
|
||||
:::
|
||||
|
||||
```{tip}
|
||||
:::{tip}
|
||||
If your model requires a development version of HF Transformers, you can set
|
||||
`min_transformers_version` to skip the test in CI until the model is released.
|
||||
```
|
||||
:::
|
||||
|
||||
## Optional Tests
|
||||
|
||||
|
||||
Reference in New Issue
Block a user