Instead of always returning False (which broke tool call streaming), use a heuristic: if think-end appears in the token IDs but is followed by more than 3 tokens (chat template wrapping like <|im_end|>, user markers, etc.), it's from a prior turn's prompt and reasoning hasn't started in the current generation. Return False. If think-end is at or near the end, it's from generated tokens and reasoning has ended. Return True.
16 KiB
16 KiB