[New Model]: support GTE NewModel (#17986)
This commit is contained in:
@@ -701,12 +701,22 @@ Specified using `--task embed`.
|
||||
* ✅︎
|
||||
* ✅︎
|
||||
- * `GteModel`
|
||||
* GteModel
|
||||
* Arctic-Embed-2.0-M
|
||||
* `Snowflake/snowflake-arctic-embed-m-v2.0`.
|
||||
*
|
||||
* ︎
|
||||
- * `GteNewModel`
|
||||
* mGTE-TRM (see note)
|
||||
* `Alibaba-NLP/gte-multilingual-base`, etc.
|
||||
* ︎
|
||||
* ︎
|
||||
- * `ModernBertModel`
|
||||
* ModernBERT-based
|
||||
* `Alibaba-NLP/gte-modernbert-base`, etc.
|
||||
* ︎
|
||||
* ︎
|
||||
- * `NomicBertModel`
|
||||
* NomicBertModel
|
||||
* Nomic BERT
|
||||
* `nomic-ai/nomic-embed-text-v1`, `nomic-ai/nomic-embed-text-v2-moe`, `Snowflake/snowflake-arctic-embed-m-long`, etc.
|
||||
* ︎
|
||||
* ︎
|
||||
@@ -749,6 +759,10 @@ See [relevant issue on HF Transformers](https://github.com/huggingface/transform
|
||||
`jinaai/jina-embeddings-v3` supports multiple tasks through lora, while vllm temporarily only supports text-matching tasks by merging lora weights.
|
||||
:::
|
||||
|
||||
:::{note}
|
||||
The second-generation GTE model (mGTE-TRM) is named `NewModel`. The name `NewModel` is too generic, you should set `--hf-overrides '{"architectures": ["GteNewModel"]}'` to specify the use of the `GteNewModel` architecture.
|
||||
:::
|
||||
|
||||
If your model is not in the above list, we will try to automatically convert the model using
|
||||
{func}`~vllm.model_executor.models.adapters.as_embedding_model`. By default, the embeddings
|
||||
of the whole prompt are extracted from the normalized hidden state corresponding to the last token.
|
||||
|
||||
Reference in New Issue
Block a user