[Model] Support NVLM-D and fix QK Norm in InternViT (#9045)

Co-authored-by: Roger Wang <ywang@roblox.com> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2024-10-07 19:55:12 +08:00
parent f19da64871
commit 151ef4efd2
12 changed files with 518 additions and 236 deletions
--- a/docs/source/models/supported_models.rst
+++ b/docs/source/models/supported_models.rst
@@ -315,6 +315,9 @@ Multimodal Language Models

 .. _supported_vlms:

+Text Generation
+---------------
+
 .. list-table::
  :widths: 25 25 25 25 5 5
  :header-rows: 1
@@ -384,7 +387,13 @@ Multimodal Language Models
    - Image
    - :code:`meta-llama/Llama-3.2-90B-Vision-Instruct`, :code:`meta-llama/Llama-3.2-11B-Vision`, etc.
    -
+    -
+  * - :code:`NVLM_D_Model`
+    - NVLM-D 1.0
+    - Image\ :sup:`E+`
+    - :code:`nvidia/NVLM-D-72B`, etc.
    - 
+    - ✅︎
  * - :code:`PaliGemmaForConditionalGeneration`
    - PaliGemma
    - Image\ :sup:`E`