vllm-to-sglang

Author	SHA1	Message	Date
biondizzle	7d9c4da2ee	not sure why we have a default tool parser	2026-04-13 17:49:44 +00:00
biondizzle	efc9dc33e7	dynamic arg translation, remove entrypoint.sh, update README	2026-04-12 21:23:26 +00:00
biondizzle	7c1ed0408b	fix: recursive _fix_schema to handle nested properties=[] at any depth	2026-04-12 20:52:44 +00:00
biondizzle	a9911386e0	strip guided_json, guided_regex too; fix parameters.properties array	2026-04-12 20:27:44 +00:00
biondizzle	ccedd3ecee	fix: add chat_template_kwargs to STRIP_PARAMS, fix parameters.properties array	2026-04-12 20:23:10 +00:00
biondizzle	c66511e16f	fix: handle parameters.properties being array, not just parameters itself	2026-04-12 20:17:06 +00:00
biondizzle	e03e41eb4f	fix vLLM/SGLang schema mismatc	2026-04-12 19:57:47 +00:00
biondizzle	7ecbac2dc0	Fix UnboundLocalError in health(), switch from on_event to lifespan	2026-04-12 19:41:08 +00:00
biondizzle	774964a4db	Add error dump logging: capture full request+response on 4xx/5xx from SGLang	2026-04-12 19:28:04 +00:00
biondizzle	db9231f796	Fix middleware: handle SGLang startup lag gracefully - Add /health endpoint that returns 503 until SGLang is ready - Background task polls SGLang until it accepts connections - Catch ConnectError/TimeoutException instead of crashing - Return 503 JSON error when SGLang backend is unavailable - haproxy health-checks middleware /health, which reflects SGLang state	2026-04-12 19:06:38 +00:00
biondizzle	bbe40ac8c0	Add middleware to strip vLLM-only params (logprobs/top_logprobs) before forwarding to SGLang SGLang's Mistral tool-call parser rejects logprobs/top_logprobs with 422, while vLLM accepts them. Clients like OpenClaw send these by default. New architecture: haproxy (port N) → middleware (port N+2) → SGLang (port N+1) The middleware is a thin FastAPI app that strips incompatible params from chat completion request bodies and passes everything else through unchanged.	2026-04-12 18:58:37 +00:00
biondizzle	359aa94337	Update README: haproxy proxy layer, /health probe fix, current state	2026-04-12 18:27:06 +00:00
biondizzle	6476c9c12a	fix: content-length 16 not 15, remove 'timeout check' (not valid in haproxy 2.4 server line)	2026-04-12 17:29:08 +00:00
biondizzle	725e61d792	fix: haproxy 2.4 compat — use errorfile instead of http-request return haproxy 2.4 (Ubuntu 22.04) doesn't support http-request return with payload/content-type syntax (that's 2.8+). Switch to errorfile-based stub responses: http-request deny deny_status N + errorfile N path.	2026-04-12 17:26:45 +00:00
biondizzle	1ddc08c88b	haproxy: intercept /health too — instant response based on backend state SGLang's /health takes ~1.001s, racing the 1s k8s probe timeout. Now haproxy health-checks SGLang in the background (5s interval, 3s check timeout) and responds to /health probes instantly: 200 if backend is up, 503 if not.	2026-04-12 17:21:04 +00:00
biondizzle	7fb373fdfc	Add haproxy proxy: /metrics returns 200 empty, everything else proxies to SGLang SGLang now runs on port+1, haproxy binds the original vLLM port. haproxy serves a stub /metrics endpoint (200, empty body) and passes all other traffic through to SGLang via raw TCP proxy.	2026-04-12 17:09:58 +00:00
biondizzle	dd3a981497	Log all received args to /tmp/vllm-shim.log	2026-04-12 04:37:24 +00:00
biondizzle	513f8bb5dd	we dont need to compile aiter	2026-04-12 04:16:50 +00:00
biondizzle	2ac7778c15	Rewrite README: explain the shim, current state, and how to adapt for other models	2026-04-12 03:07:43 +00:00
biondizzle	71f7fe0881	fix aiter	2026-04-12 02:56:27 +00:00
biondizzle	b6151ba5db	fix aiter	2026-04-12 02:47:33 +00:00
biondizzle	4d444bebbb	use a shim	2026-04-12 02:19:55 +00:00
biondizzle	c86fbe0166	Fix Jenkinsfile: agent any, nightly default, proper quoting	2026-04-12 00:22:29 +00:00
biondizzle	d71248d0f6	init commit	2026-04-11 23:39:36 +00:00

24 Commits