[Model] Support NVLM-D and fix QK Norm in InternViT (#9045)

Co-authored-by: Roger Wang <ywang@roblox.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
This commit is contained in:
Cyrus Leung
2024-10-07 19:55:12 +08:00
committed by GitHub
parent f19da64871
commit 151ef4efd2
12 changed files with 518 additions and 236 deletions

View File

@@ -315,6 +315,9 @@ Multimodal Language Models
.. _supported_vlms:
Text Generation
---------------
.. list-table::
:widths: 25 25 25 25 5 5
:header-rows: 1
@@ -384,7 +387,13 @@ Multimodal Language Models
- Image
- :code:`meta-llama/Llama-3.2-90B-Vision-Instruct`, :code:`meta-llama/Llama-3.2-11B-Vision`, etc.
-
-
* - :code:`NVLM_D_Model`
- NVLM-D 1.0
- Image\ :sup:`E+`
- :code:`nvidia/NVLM-D-72B`, etc.
-
- ✅︎
* - :code:`PaliGemmaForConditionalGeneration`
- PaliGemma
- Image\ :sup:`E`