[Model] Add support for Qwen2-VL video embeddings input & multiple image embeddings input with varied resolutions (#10221)

Signed-off-by: imkero <kerorek@outlook.com>
2024-11-13 15:07:22 +08:00
parent 032fcf16ae
commit 3945c82346
3 changed files with 577 additions and 31 deletions
--- a/docs/source/models/supported_models.rst
+++ b/docs/source/models/supported_models.rst
@@ -538,7 +538,7 @@ Text Generation
    - ✅︎
  * - :code:`Qwen2VLForConditionalGeneration`
    - Qwen2-VL
-    - T + I\ :sup:`E+` + V\ :sup:`+`
+    - T + I\ :sup:`E+` + V\ :sup:`E+`
    - :code:`Qwen/Qwen2-VL-2B-Instruct`, :code:`Qwen/Qwen2-VL-7B-Instruct`, :code:`Qwen/Qwen2-VL-72B-Instruct`, etc.
    - ✅︎
    - ✅︎