vllm-kimi25-eagle

biondizzle/vllm-kimi25-eagle

Fork 0

Commit Graph

Author	SHA1	Message	Date
biondizzle	9051c610d2	Fix reasoning parser for multi-turn conversations The streaming path was using is_reasoning_end(previous_token_ids) to check if reasoning had ended. On multi-turn conversations, previous_token_ids includes the entire chat history, including think-end tokens from prior assistant messages. This caused the parser to incorrectly think reasoning was already over before the model generated anything, routing all thinking text to content instead of reasoning. Fix: Replace the token-ID-based check with a text-based state variable (_reasoning_ended) that tracks reasoning end based solely on what the model has generated in the current turn. Reset on each new generation. Also includes the chat template for reference.	2026-04-14 07:46:33 +00:00

Author

SHA1

Message

Date

biondizzle

9051c610d2

Fix reasoning parser for multi-turn conversations

The streaming path was using is_reasoning_end(previous_token_ids) to
check if reasoning had ended. On multi-turn conversations,
previous_token_ids includes the entire chat history, including
think-end tokens from prior assistant messages. This caused the parser
to incorrectly think reasoning was already over before the model
generated anything, routing all thinking text to content instead of
reasoning.

Fix: Replace the token-ID-based check with a text-based state variable
(_reasoning_ended) that tracks reasoning end based solely on what the
model has generated in the current turn. Reset on each new generation.
Also includes the chat template for reference.

2026-04-14 07:46:33 +00:00

1 Commits