vllm-kimi25-eagle/Dockerfile at 043f51322f81f6e06e464204173a46fb3163d502

Files

biondizzle 043f51322f Patch vLLM serving layer to flush reasoning on finish_reason=length

When the model runs out of tokens while still reasoning (no think-end
emitted), all text goes to the reasoning field with zero content — the
model appears silent to the client.

Streaming fix: yield an extra content delta with the extracted reasoning
text before the finish chunk, so the client can see the output.

Non-streaming fix: move reasoning to content when finish_reason=length
and content is None.

Also adds the patched serving.py to the Dockerfile.

2026-04-14 09:49:45 +00:00

902 B

Raw Blame History

View Raw

902 B Raw Blame History

902 B

Raw Blame History