Files
vllm-kimi25-eagle/Dockerfile
biondizzle 043f51322f Patch vLLM serving layer to flush reasoning on finish_reason=length
When the model runs out of tokens while still reasoning (no think-end
emitted), all text goes to the reasoning field with zero content — the
model appears silent to the client.

Streaming fix: yield an extra content delta with the extracted reasoning
text before the finish chunk, so the client can see the output.

Non-streaming fix: move reasoning to content when finish_reason=length
and content is None.

Also adds the patched serving.py to the Dockerfile.
2026-04-14 09:49:45 +00:00

902 B