vllm-kimi25-eagle

Author	SHA1	Message	Date
biondizzle	3f2708a095	keep everything .py	2026-04-14 09:51:51 +00:00
biondizzle	043f51322f	Patch vLLM serving layer to flush reasoning on finish_reason=length When the model runs out of tokens while still reasoning (no think-end emitted), all text goes to the reasoning field with zero content — the model appears silent to the client. Streaming fix: yield an extra content delta with the extracted reasoning text before the finish chunk, so the client can see the output. Non-streaming fix: move reasoning to content when finish_reason=length and content is None. Also adds the patched serving.py to the Dockerfile.	2026-04-14 09:49:45 +00:00
biondizzle	d4568f1d80	more speculative decoding fixes	2026-04-14 05:06:30 +00:00
biondizzle	9be82d3574	add the tool call parser fixes for eagle decode	2026-04-14 03:13:24 +00:00
biondizzle	ad76c78630	Install unzip for Eagle3 extraction, then remove it	2026-04-13 15:52:29 +00:00
biondizzle	3293502548	init commit	2026-04-13 15:24:48 +00:00