[Core] Switch Flat logprob control from environment variable to SamplingParams (#28914)
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com> Co-authored-by: 22quinn <33176974+22quinn@users.noreply.github.com>
This commit is contained in:
@@ -204,6 +204,12 @@ class SamplingParams(
|
||||
prompt_logprobs: int | None = None
|
||||
"""Number of log probabilities to return per prompt token.
|
||||
When set to -1, return all `vocab_size` log probabilities."""
|
||||
flat_logprobs: bool = False
|
||||
"""Whether to return logprobs in flatten format (i.e. FlatLogprob)
|
||||
for better performance.
|
||||
NOTE: GC costs of FlatLogprobs is significantly smaller than
|
||||
list[dict[int, Logprob]]. After enabled, PromptLogprobs and
|
||||
SampleLogprobs would populated as FlatLogprobs."""
|
||||
# NOTE: This parameter is only exposed at the engine level for now.
|
||||
# It is not exposed in the OpenAI API server, as the OpenAI API does
|
||||
# not support returning only a list of token IDs.
|
||||
|
||||
Reference in New Issue
Block a user