[CI/Build][Doc] Update gte-Qwen2-1.5B-instruct usage (#18683)

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: Isotr0py <2037008807@qq.com> Co-authored-by: Isotr0py <2037008807@qq.com>
2025-05-26 11:27:50 +08:00
parent 6071e989df
commit fba0642704
3 changed files with 2 additions and 18 deletions
--- a/docs/models/supported_models.md
+++ b/docs/models/supported_models.md
@@ -404,10 +404,7 @@ Specified using `--task embed`.
    You should manually set mean pooling by passing `--override-pooler-config '{"pooling_type": "MEAN"}'`.

 !!! note
-    The HF implementation of `Alibaba-NLP/gte-Qwen2-1.5B-instruct` is hardcoded to use causal attention despite what is shown in `config.json`. To compare vLLM vs HF results,
-    you should set `--hf-overrides '{"is_causal": true}'` in vLLM so that the two implementations are consistent with each other.
-
-    For both the 1.5B and 7B variants, you also need to enable `--trust-remote-code` for the correct tokenizer to be loaded.
+    For `Alibaba-NLP/gte-Qwen2-*`, you need to enable `--trust-remote-code` for the correct tokenizer to be loaded.
    See [relevant issue on HF Transformers](https://github.com/huggingface/transformers/issues/34882).

 !!! note