[Core] Switch Flat logprob control from environment variable to SamplingParams (#28914)

Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
Co-authored-by: 22quinn <33176974+22quinn@users.noreply.github.com>
This commit is contained in:
Jialin Ouyang
2025-11-18 18:10:02 -08:00
committed by GitHub
parent da94c7c0eb
commit 40b6b38f2c
6 changed files with 33 additions and 41 deletions

View File

@@ -204,6 +204,12 @@ class SamplingParams(
prompt_logprobs: int | None = None
"""Number of log probabilities to return per prompt token.
When set to -1, return all `vocab_size` log probabilities."""
flat_logprobs: bool = False
"""Whether to return logprobs in flatten format (i.e. FlatLogprob)
for better performance.
NOTE: GC costs of FlatLogprobs is significantly smaller than
list[dict[int, Logprob]]. After enabled, PromptLogprobs and
SampleLogprobs would populated as FlatLogprobs."""
# NOTE: This parameter is only exposed at the engine level for now.
# It is not exposed in the OpenAI API server, as the OpenAI API does
# not support returning only a list of token IDs.