[Misc] Split up pooling tasks (#10820)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
This commit is contained in:
@@ -3,11 +3,21 @@
|
||||
Supported Models
|
||||
================
|
||||
|
||||
vLLM supports a variety of generative and embedding models from `HuggingFace (HF) Transformers <https://huggingface.co/models>`_.
|
||||
This page lists the model architectures that are currently supported by vLLM.
|
||||
vLLM supports generative and pooling models across various tasks.
|
||||
If a model supports more than one task, you can set the task via the :code:`--task` argument.
|
||||
|
||||
For each task, we list the model architectures that have been implemented in vLLM.
|
||||
Alongside each architecture, we include some popular models that use it.
|
||||
|
||||
For other models, you can check the :code:`config.json` file inside the model repository.
|
||||
Loading a Model
|
||||
^^^^^^^^^^^^^^^
|
||||
|
||||
HuggingFace Hub
|
||||
+++++++++++++++
|
||||
|
||||
By default, vLLM loads models from `HuggingFace (HF) Hub <https://huggingface.co/models>`_.
|
||||
|
||||
To determine whether a given model is supported, you can check the :code:`config.json` file inside the HF repository.
|
||||
If the :code:`"architectures"` field contains a model architecture listed below, then it should be supported in theory.
|
||||
|
||||
.. tip::
|
||||
@@ -17,38 +27,57 @@ If the :code:`"architectures"` field contains a model architecture listed below,
|
||||
|
||||
from vllm import LLM
|
||||
|
||||
llm = LLM(model=...) # Name or path of your model
|
||||
# For generative models (task=generate) only
|
||||
llm = LLM(model=..., task="generate") # Name or path of your model
|
||||
output = llm.generate("Hello, my name is")
|
||||
print(output)
|
||||
|
||||
If vLLM successfully generates text, it indicates that your model is supported.
|
||||
# For pooling models (task={embed,classify,reward}) only
|
||||
llm = LLM(model=..., task="embed") # Name or path of your model
|
||||
output = llm.encode("Hello, my name is")
|
||||
print(output)
|
||||
|
||||
If vLLM successfully returns text (for generative models) or hidden states (for pooling models), it indicates that your model is supported.
|
||||
|
||||
Otherwise, please refer to :ref:`Adding a New Model <adding_a_new_model>` and :ref:`Enabling Multimodal Inputs <enabling_multimodal_inputs>`
|
||||
for instructions on how to implement your model in vLLM.
|
||||
Alternatively, you can `open an issue on GitHub <https://github.com/vllm-project/vllm/issues/new/choose>`_ to request vLLM support.
|
||||
|
||||
.. note::
|
||||
To use models from `ModelScope <https://www.modelscope.cn>`_ instead of HuggingFace Hub, set an environment variable:
|
||||
ModelScope
|
||||
++++++++++
|
||||
|
||||
.. code-block:: shell
|
||||
To use models from `ModelScope <https://www.modelscope.cn>`_ instead of HuggingFace Hub, set an environment variable:
|
||||
|
||||
$ export VLLM_USE_MODELSCOPE=True
|
||||
.. code-block:: shell
|
||||
|
||||
And use with :code:`trust_remote_code=True`.
|
||||
$ export VLLM_USE_MODELSCOPE=True
|
||||
|
||||
.. code-block:: python
|
||||
And use with :code:`trust_remote_code=True`.
|
||||
|
||||
from vllm import LLM
|
||||
.. code-block:: python
|
||||
|
||||
llm = LLM(model=..., revision=..., trust_remote_code=True) # Name or path of your model
|
||||
output = llm.generate("Hello, my name is")
|
||||
print(output)
|
||||
from vllm import LLM
|
||||
|
||||
Text-only Language Models
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
llm = LLM(model=..., revision=..., task=..., trust_remote_code=True)
|
||||
|
||||
Text Generation
|
||||
---------------
|
||||
# For generative models (task=generate) only
|
||||
output = llm.generate("Hello, my name is")
|
||||
print(output)
|
||||
|
||||
# For pooling models (task={embed,classify,reward}) only
|
||||
output = llm.encode("Hello, my name is")
|
||||
print(output)
|
||||
|
||||
List of Text-only Language Models
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
Generative Models
|
||||
+++++++++++++++++
|
||||
|
||||
See :ref:`this page <generative_models>` for more information on how to use generative models.
|
||||
|
||||
Text Generation (``--task generate``)
|
||||
-------------------------------------
|
||||
|
||||
.. list-table::
|
||||
:widths: 25 25 50 5 5
|
||||
@@ -328,8 +357,24 @@ Text Generation
|
||||
.. note::
|
||||
Currently, the ROCm version of vLLM supports Mistral and Mixtral only for context lengths up to 4096.
|
||||
|
||||
Text Embedding
|
||||
--------------
|
||||
Pooling Models
|
||||
++++++++++++++
|
||||
|
||||
See :ref:`this page <pooling_models>` for more information on how to use pooling models.
|
||||
|
||||
.. important::
|
||||
Since some model architectures support both generative and pooling tasks,
|
||||
you should explicitly specify the task type to ensure that the model is used in pooling mode instead of generative mode.
|
||||
|
||||
Text Embedding (``--task embed``)
|
||||
---------------------------------
|
||||
|
||||
Any text generation model can be converted into an embedding model by passing :code:`--task embed`.
|
||||
|
||||
.. note::
|
||||
To get the best results, you should use pooling models that are specifically trained as such.
|
||||
|
||||
The following table lists those that are tested in vLLM.
|
||||
|
||||
.. list-table::
|
||||
:widths: 25 25 50 5 5
|
||||
@@ -371,13 +416,6 @@ Text Embedding
|
||||
-
|
||||
-
|
||||
|
||||
.. important::
|
||||
Some model architectures support both generation and embedding tasks.
|
||||
In this case, you have to pass :code:`--task embedding` to run the model in embedding mode.
|
||||
|
||||
.. tip::
|
||||
You can override the model's pooling method by passing :code:`--override-pooler-config`.
|
||||
|
||||
.. note::
|
||||
:code:`ssmits/Qwen2-7B-Instruct-embed-base` has an improperly defined Sentence Transformers config.
|
||||
You should manually set mean pooling by passing :code:`--override-pooler-config '{"pooling_type": "MEAN"}'`.
|
||||
@@ -389,8 +427,8 @@ Text Embedding
|
||||
On the other hand, its 1.5B variant (:code:`Alibaba-NLP/gte-Qwen2-1.5B-instruct`) uses causal attention
|
||||
despite being described otherwise on its model card.
|
||||
|
||||
Reward Modeling
|
||||
---------------
|
||||
Reward Modeling (``--task reward``)
|
||||
-----------------------------------
|
||||
|
||||
.. list-table::
|
||||
:widths: 25 25 50 5 5
|
||||
@@ -416,11 +454,8 @@ Reward Modeling
|
||||
For process-supervised reward models such as :code:`peiyi9979/math-shepherd-mistral-7b-prm`, the pooling config should be set explicitly,
|
||||
e.g.: :code:`--override-pooler-config '{"pooling_type": "STEP", "step_tag_id": 123, "returned_token_ids": [456, 789]}'`.
|
||||
|
||||
.. note::
|
||||
As an interim measure, these models are supported in both offline and online inference via Embeddings API.
|
||||
|
||||
Classification
|
||||
---------------
|
||||
Classification (``--task classify``)
|
||||
------------------------------------
|
||||
|
||||
.. list-table::
|
||||
:widths: 25 25 50 5 5
|
||||
@@ -437,11 +472,8 @@ Classification
|
||||
- ✅︎
|
||||
- ✅︎
|
||||
|
||||
.. note::
|
||||
As an interim measure, these models are supported in both offline and online inference via Embeddings API.
|
||||
|
||||
Sentence Pair Scoring
|
||||
---------------------
|
||||
Sentence Pair Scoring (``--task score``)
|
||||
----------------------------------------
|
||||
|
||||
.. list-table::
|
||||
:widths: 25 25 50 5 5
|
||||
@@ -468,13 +500,10 @@ Sentence Pair Scoring
|
||||
-
|
||||
-
|
||||
|
||||
.. note::
|
||||
These models are supported in both offline and online inference via Score API.
|
||||
|
||||
.. _supported_mm_models:
|
||||
|
||||
Multimodal Language Models
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
List of Multimodal Language Models
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
The following modalities are supported depending on the model:
|
||||
|
||||
@@ -491,8 +520,15 @@ On the other hand, modalities separated by :code:`/` are mutually exclusive.
|
||||
|
||||
- e.g.: :code:`T / I` means that the model supports text-only and image-only inputs, but not text-with-image inputs.
|
||||
|
||||
Text Generation
|
||||
---------------
|
||||
See :ref:`this page <multimodal_inputs>` on how to pass multi-modal inputs to the model.
|
||||
|
||||
Generative Models
|
||||
+++++++++++++++++
|
||||
|
||||
See :ref:`this page <generative_models>` for more information on how to use generative models.
|
||||
|
||||
Text Generation (``--task generate``)
|
||||
-------------------------------------
|
||||
|
||||
.. list-table::
|
||||
:widths: 25 25 15 20 5 5 5
|
||||
@@ -696,8 +732,24 @@ Text Generation
|
||||
The official :code:`openbmb/MiniCPM-V-2` doesn't work yet, so we need to use a fork (:code:`HwwwH/MiniCPM-V-2`) for now.
|
||||
For more details, please see: https://github.com/vllm-project/vllm/pull/4087#issuecomment-2250397630
|
||||
|
||||
Multimodal Embedding
|
||||
--------------------
|
||||
Pooling Models
|
||||
++++++++++++++
|
||||
|
||||
See :ref:`this page <pooling_models>` for more information on how to use pooling models.
|
||||
|
||||
.. important::
|
||||
Since some model architectures support both generative and pooling tasks,
|
||||
you should explicitly specify the task type to ensure that the model is used in pooling mode instead of generative mode.
|
||||
|
||||
Text Embedding (``--task embed``)
|
||||
---------------------------------
|
||||
|
||||
Any text generation model can be converted into an embedding model by passing :code:`--task embed`.
|
||||
|
||||
.. note::
|
||||
To get the best results, you should use pooling models that are specifically trained as such.
|
||||
|
||||
The following table lists those that are tested in vLLM.
|
||||
|
||||
.. list-table::
|
||||
:widths: 25 25 15 25 5 5
|
||||
@@ -728,12 +780,7 @@ Multimodal Embedding
|
||||
-
|
||||
- ✅︎
|
||||
|
||||
.. important::
|
||||
Some model architectures support both generation and embedding tasks.
|
||||
In this case, you have to pass :code:`--task embedding` to run the model in embedding mode.
|
||||
|
||||
.. tip::
|
||||
You can override the model's pooling method by passing :code:`--override-pooler-config`.
|
||||
----
|
||||
|
||||
Model Support Policy
|
||||
=====================
|
||||
|
||||
Reference in New Issue
Block a user