[V0 Deprecation] Remove pooling model support in V0 (#23434)

Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
This commit is contained in:
Maximilien de Bayser
2025-08-29 04:04:02 -03:00
committed by GitHub
parent 934bebf192
commit 2554b27baa
38 changed files with 99 additions and 808 deletions

View File

@@ -1156,8 +1156,7 @@ class LLM:
tokenization_kwargs=tokenization_kwargs,
)
if envs.VLLM_USE_V1 and (token_type_ids := engine_prompt.pop(
"token_type_ids", None)):
if (token_type_ids := engine_prompt.pop("token_type_ids", None)):
params = pooling_params.clone()
compressed = compress_token_type_ids(token_type_ids)
params.extra_kwargs = {"compressed_token_type_ids": compressed}