[Model] PP support for embedding models and update docs (#9090)

Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
2024-10-06 16:35:27 +08:00
parent f22619fe96
commit b22b798471
12 changed files with 612 additions and 451 deletions
--- a/docs/source/models/supported_models.rst
+++ b/docs/source/models/supported_models.rst
@@ -7,10 +7,12 @@ vLLM supports a variety of generative Transformer models in `HuggingFace Transfo
 The following is the list of model architectures that are currently supported by vLLM.
 Alongside each architecture, we include some popular models that use it.

----
+Text-only Language Models
+^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Text Generation
+---------------

-Decoder-only Language Models
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 .. list-table::
  :widths: 25 25 50 5 5
  :header-rows: 1
@@ -40,6 +42,11 @@ Decoder-only Language Models
    - :code:`bigscience/bloom`, :code:`bigscience/bloomz`, etc.
    -
    - ✅︎
+  * - :code:`BartForConditionalGeneration`
+    - BART
+    - :code:`facebook/bart-base`, :code:`facebook/bart-large-cnn`, etc.
+    - 
+    - 
  * - :code:`ChatGLMModel`
    - ChatGLM
    - :code:`THUDM/chatglm2-6b`, :code:`THUDM/chatglm3-6b`, etc.
@@ -259,11 +266,55 @@ Decoder-only Language Models
 .. note::
    Currently, the ROCm version of vLLM supports Mistral and Mixtral only for context lengths up to 4096.

-.. _supported_vlms:
+Text Embedding
+--------------
+
+.. list-table::
+  :widths: 25 25 50 5 5
+  :header-rows: 1
+
+  * - Architecture
+    - Models
+    - Example HuggingFace Models
+    - :ref:`LoRA <lora>`
+    - :ref:`PP <distributed_serving>`
+  * - :code:`Gemma2Model`
+    - Gemma2-based
+    - :code:`BAAI/bge-multilingual-gemma2`, etc.
+    - 
+    - ✅︎
+  * - :code:`MistralModel`
+    - Mistral-based
+    - :code:`intfloat/e5-mistral-7b-instruct`, etc.
+    - 
+    - ✅︎
+
+Reward Modeling
+---------------
+
+.. list-table::
+  :widths: 25 25 50 5 5
+  :header-rows: 1
+
+  * - Architecture
+    - Models
+    - Example HuggingFace Models
+    - :ref:`LoRA <lora>`
+    - :ref:`PP <distributed_serving>`
+  * - :code:`Qwen2ForRewardModel`
+    - Qwen2-based
+    - :code:`Qwen/Qwen2.5-Math-RM-72B`, etc.
+    - 
+    - ✅︎
+
+.. note::
+    As an interim measure, these models are supported via Embeddings API. See `this RFC <https://github.com/vllm-project/vllm/issues/8967>`_ for upcoming changes.

 Multimodal Language Models
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^

+.. _supported_vlms:
+
 .. list-table::
  :widths: 25 25 25 25 5 5
  :header-rows: 1
@@ -378,6 +429,7 @@ Multimodal Language Models
  For :code:`openbmb/MiniCPM-V-2`, the official repo doesn't work yet, so we need to use a fork (:code:`HwwwH/MiniCPM-V-2`) for now.
  For more details, please see: https://github.com/vllm-project/vllm/pull/4087#issuecomment-2250397630

+----

 If your model uses one of the above model architectures, you can seamlessly run your model with vLLM.
 Otherwise, please refer to :ref:`Adding a New Model <adding_a_new_model>` and :ref:`Enabling Multimodal Inputs <enabling_multimodal_inputs>`