Truncation control for embedding models (#14776)

Signed-off-by: Gabriel Marinho <gmarinho@ibm.com> Signed-off-by: Max de Bayser <mbayser@br.ibm.com> Co-authored-by: Max de Bayser <mbayser@br.ibm.com>
2025-04-29 22:24:57 -03:00
parent 4055130a85
commit 1c2bc7ead0
21 changed files with 333 additions and 71 deletions
--- a/vllm/sampling_params.py
+++ b/vllm/sampling_params.py
@@ -186,9 +186,10 @@ class SamplingParams(
        logits_processors: list of functions that modify logits based on
            previously generated tokens, and optionally prompt tokens as
            a first argument.
-        truncate_prompt_tokens: If set to an integer k, will use only the last k
-            tokens from the prompt (i.e., left truncation). Defaults to None
-            (i.e., no truncation).
+        truncate_prompt_tokens: If set to -1, will use the truncation size 
+            supported by the model. If set to an integer k, will use only 
+            the last k tokens from the prompt (i.e., left truncation). 
+            Defaults to None (i.e., no truncation).
        guided_decoding: If provided, the engine will construct a guided
            decoding logits processor from these parameters. Defaults to None.
        logit_bias: If provided, the engine will construct a logits processor