A1: Add ◇ (think_start) priming after Assistant token
DSV4 is a reasoning model. The standard prompt format is: BOS <|User|> prompt <|Assistant|> ◇ Without the ◇ priming, the model is out-of-distribution — it expects to be inside a thinking block but never received the sentinel. This causes degenerate output from step 0 (France instead of Paris, looping on newlines/repeated tokens). With ◇, the model will: 1. Generate thinking content (reasoning) 2. Emit ◇ (think_end=128822) to close the thinking block 3. Produce the actual answer 4. Emit EOS (token 1) This matches the pattern described in the Kimi K2 accuracy blog: https://vllm.ai/blog/2025-10-28-kimi-k2-accuracy — malformed prompt formatting is the #1 cause of degenerate output in chat-tuned reasoning models.
This commit is contained in:
@@ -1361,6 +1361,13 @@ def main():
|
||||
input_ids = [bos, USER_TOKEN]
|
||||
input_ids += tokenizer.encode('\n\n' + PROMPT, add_special_tokens=False)
|
||||
input_ids.append(ASSISTANT_TOKEN)
|
||||
# DSV4 reasoning model: must prime with ◇ (think_start) after Assistant token.
|
||||
# Without this, the model is out-of-distribution — it expects to be inside a
|
||||
# thinking block but never received the think-start sentinel.
|
||||
# Symptom: degenerate output from step 0 (e.g. "France" instead of "Paris",
|
||||
# looping on newlines/repeated tokens). With ◇, the model generates thinking
|
||||
# content, emits ◇ (think_end), then produces the actual answer.
|
||||
input_ids.append(THINK_START)
|
||||
generated = input_ids
|
||||
all_tokens = generated.copy()
|
||||
print(f"Input: {len(generated)} tokens")
|
||||
|
||||
Reference in New Issue
Block a user