vllm-to-sglang

Author	SHA1	Message	Date
biondizzle	1ddc08c88b	haproxy: intercept /health too — instant response based on backend state SGLang's /health takes ~1.001s, racing the 1s k8s probe timeout. Now haproxy health-checks SGLang in the background (5s interval, 3s check timeout) and responds to /health probes instantly: 200 if backend is up, 503 if not.	2026-04-12 17:21:04 +00:00
biondizzle	7fb373fdfc	Add haproxy proxy: /metrics returns 200 empty, everything else proxies to SGLang SGLang now runs on port+1, haproxy binds the original vLLM port. haproxy serves a stub /metrics endpoint (200, empty body) and passes all other traffic through to SGLang via raw TCP proxy.	2026-04-12 17:09:58 +00:00
biondizzle	dd3a981497	Log all received args to /tmp/vllm-shim.log	2026-04-12 04:37:24 +00:00
biondizzle	513f8bb5dd	we dont need to compile aiter	2026-04-12 04:16:50 +00:00
biondizzle	2ac7778c15	Rewrite README: explain the shim, current state, and how to adapt for other models	2026-04-12 03:07:43 +00:00
biondizzle	71f7fe0881	fix aiter	2026-04-12 02:56:27 +00:00
biondizzle	b6151ba5db	fix aiter	2026-04-12 02:47:33 +00:00
biondizzle	4d444bebbb	use a shim	2026-04-12 02:19:55 +00:00
biondizzle	c86fbe0166	Fix Jenkinsfile: agent any, nightly default, proper quoting	2026-04-12 00:22:29 +00:00
biondizzle	d71248d0f6	init commit	2026-04-11 23:39:36 +00:00