[Model] Support is_causal HF config field for Qwen2 model (#10621)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
This commit is contained in:
@@ -342,7 +342,7 @@ Text Embedding
|
||||
- ✅︎
|
||||
* - :code:`Qwen2Model`, :code:`Qwen2ForCausalLM`
|
||||
- Qwen2-based
|
||||
- :code:`ssmits/Qwen2-7B-Instruct-embed-base`, :code:`Alibaba-NLP/gte-Qwen2-1.5B-instruct`, etc.
|
||||
- :code:`ssmits/Qwen2-7B-Instruct-embed-base`, :code:`Alibaba-NLP/gte-Qwen2-7B-instruct` (see note), etc.
|
||||
- ✅︎
|
||||
- ✅︎
|
||||
* - :code:`RobertaModel`, :code:`RobertaForMaskedLM`
|
||||
@@ -363,6 +363,13 @@ Text Embedding
|
||||
.. tip::
|
||||
You can override the model's pooling method by passing :code:`--override-pooler-config`.
|
||||
|
||||
.. note::
|
||||
Unlike base Qwen2, :code:`Alibaba-NLP/gte-Qwen2-7B-instruct` uses bi-directional attention.
|
||||
You can set `--hf-overrides '{"is_causal": false}'` to change the attention mask accordingly.
|
||||
|
||||
On the other hand, its 1.5B variant (:code:`Alibaba-NLP/gte-Qwen2-1.5B-instruct`) uses causal attention
|
||||
despite being described otherwise on its model card.
|
||||
|
||||
Reward Modeling
|
||||
---------------
|
||||
|
||||
@@ -606,10 +613,10 @@ Text Generation
|
||||
| :sup:`+` Multiple items can be inputted per text prompt for this modality.
|
||||
|
||||
.. note::
|
||||
vLLM currently only supports adding LoRA to the language backbone of multimodal models.
|
||||
vLLM currently only supports adding LoRA to the language backbone of multimodal models.
|
||||
|
||||
.. note::
|
||||
For :code:`openbmb/MiniCPM-V-2`, the official repo doesn't work yet, so we need to use a fork (:code:`HwwwH/MiniCPM-V-2`) for now.
|
||||
The official :code:`openbmb/MiniCPM-V-2` doesn't work yet, so we need to use a fork (:code:`HwwwH/MiniCPM-V-2`) for now.
|
||||
For more details, please see: https://github.com/vllm-project/vllm/pull/4087#issuecomment-2250397630
|
||||
|
||||
Multimodal Embedding
|
||||
|
||||
Reference in New Issue
Block a user