When the model runs out of tokens while still reasoning (no think-end emitted), all text goes to the reasoning field with zero content — the model appears silent to the client. Streaming fix: yield an extra content delta with the extracted reasoning text before the finish chunk, so the client can see the output. Non-streaming fix: move reasoning to content when finish_reason=length and content is None. Also adds the patched serving.py to the Dockerfile.
902 B
902 B