The streaming path was using is_reasoning_end(previous_token_ids) to
check if reasoning had ended. On multi-turn conversations,
previous_token_ids includes the entire chat history, including
think-end tokens from prior assistant messages. This caused the parser
to incorrectly think reasoning was already over before the model
generated anything, routing all thinking text to content instead of
reasoning.
Fix: Replace the token-ID-based check with a text-based state variable
(_reasoning_ended) that tracks reasoning end based solely on what the
model has generated in the current turn. Reset on each new generation.
Also includes the chat template for reference.
Tool parser:
- Case 3/4: return None instead of DeltaMessage(content='') when
inside an open tool section with no parseable content yet.
Empty-string content deltas pollute the response and break the
content=null vs content='' contract with non-streaming.
Reasoning parser:
- Suppress tool-calls section markers from content forwarding.
The tool parser detects them via current_text re-parsing; forwarding
them as content causes double-handling.
- Already-past-reasoning path: strip section markers from content
for the same reason.