[Model] Support E5-V (#9576)

This commit is contained in:
Cyrus Leung
2024-10-23 11:35:29 +08:00
committed by GitHub
parent 29061ed9df
commit 831540cf04
12 changed files with 528 additions and 86 deletions

View File

@@ -334,6 +334,14 @@ The following modalities are supported depending on the model:
- **V**\ ideo
- **A**\ udio
Any combination of modalities joined by :code:`+` are supported.
- e.g.: :code:`T + I` means that the model supports text-only, image-only, and text-with-image inputs.
On the other hand, modalities separated by :code:`/` are mutually exclusive.
- e.g.: :code:`T / I` means that the model supports text-only and image-only inputs, but not text-with-image inputs.
.. _supported_vlms:
Text Generation
@@ -484,6 +492,12 @@ Multimodal Embedding
- Example HF Models
- :ref:`LoRA <lora>`
- :ref:`PP <distributed_serving>`
* - :code:`LlavaNextForConditionalGeneration`
- LLaVA-NeXT-based
- T / I
- :code:`royokong/e5-v`
-
- ✅︎
* - :code:`Phi3VForCausalLM`
- Phi-3-Vision-based
- T + I