[V1][Spec Decode] Fix greedy temperature detection after sampler refactor (#27077)

Signed-off-by: Pradyun Ramadorai <pradyunr@amazon.com>
Co-authored-by: Pradyun Ramadorai <pradyunr@amazon.com>
This commit is contained in:
Pradyun92
2025-10-17 16:27:47 -04:00
committed by GitHub
parent d29483b58a
commit acedc74b1a
5 changed files with 22 additions and 6 deletions

View File

@@ -15,7 +15,7 @@ from vllm.v1.spec_decode.metadata import SpecDecodeMetadata
logger = init_logger(__name__)
PLACEHOLDER_TOKEN_ID: tl.constexpr = -1
GREEDY_TEMPERATURE: tl.constexpr = -1
GREEDY_TEMPERATURE: tl.constexpr = 0
# Maximum number of speculative draft tokens allowed per request in a single
# step. This value is chosen to be large enough to handle typical use cases.
MAX_SPEC_LEN = 128