[V1][Frontend] Add Testing For V1 Runtime Parameters (#14159)

Signed-off-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>
This commit is contained in:
Robert Shaw
2025-03-05 14:18:55 +00:00
committed by GitHub
parent 47d4a7e004
commit 257e200a25
3 changed files with 201 additions and 17 deletions

View File

@@ -298,6 +298,11 @@ class InputBatch:
if sampling_params.logit_bias is not None:
self.logit_bias[req_index] = sampling_params.logit_bias
# FIXME: this implementation is incorrect. We create this mask
# then apply -inf to these specific tokens, which means we never
# select the allowed tokens! We cannot do the reverse, since
# this will impact the requests that do not have allowed_token_ids.
# This feature is currently disabled on V1 (we reject in Processor).
if sampling_params.allowed_token_ids:
self.has_allowed_token_ids.add(req_id)
if self.allowed_token_ids_mask_cpu_tensor is None: