Commit Graph

6 Commits

Author SHA1 Message Date
3f2708a095 keep everything .py 2026-04-14 09:51:51 +00:00
043f51322f Patch vLLM serving layer to flush reasoning on finish_reason=length
When the model runs out of tokens while still reasoning (no think-end
emitted), all text goes to the reasoning field with zero content — the
model appears silent to the client.

Streaming fix: yield an extra content delta with the extracted reasoning
text before the finish chunk, so the client can see the output.

Non-streaming fix: move reasoning to content when finish_reason=length
and content is None.

Also adds the patched serving.py to the Dockerfile.
2026-04-14 09:49:45 +00:00
d4568f1d80 more speculative decoding fixes 2026-04-14 05:06:30 +00:00
9be82d3574 add the tool call parser fixes for eagle decode 2026-04-14 03:13:24 +00:00
ad76c78630 Install unzip for Eagle3 extraction, then remove it 2026-04-13 15:52:29 +00:00
3293502548 init commit 2026-04-13 15:24:48 +00:00