vllm-to-sglang

biondizzle/vllm-to-sglang

Fork 0

Commit Graph

Author	SHA1	Message	Date
biondizzle	db9231f796	Fix middleware: handle SGLang startup lag gracefully - Add /health endpoint that returns 503 until SGLang is ready - Background task polls SGLang until it accepts connections - Catch ConnectError/TimeoutException instead of crashing - Return 503 JSON error when SGLang backend is unavailable - haproxy health-checks middleware /health, which reflects SGLang state	2026-04-12 19:06:38 +00:00
biondizzle	bbe40ac8c0	Add middleware to strip vLLM-only params (logprobs/top_logprobs) before forwarding to SGLang SGLang's Mistral tool-call parser rejects logprobs/top_logprobs with 422, while vLLM accepts them. Clients like OpenClaw send these by default. New architecture: haproxy (port N) → middleware (port N+2) → SGLang (port N+1) The middleware is a thin FastAPI app that strips incompatible params from chat completion request bodies and passes everything else through unchanged.	2026-04-12 18:58:37 +00:00

Author

SHA1

Message

Date

biondizzle

db9231f796

Fix middleware: handle SGLang startup lag gracefully

- Add /health endpoint that returns 503 until SGLang is ready
- Background task polls SGLang until it accepts connections
- Catch ConnectError/TimeoutException instead of crashing
- Return 503 JSON error when SGLang backend is unavailable
- haproxy health-checks middleware /health, which reflects SGLang state

2026-04-12 19:06:38 +00:00

biondizzle

bbe40ac8c0

Add middleware to strip vLLM-only params (logprobs/top_logprobs) before forwarding to SGLang

SGLang's Mistral tool-call parser rejects logprobs/top_logprobs with 422,
while vLLM accepts them. Clients like OpenClaw send these by default.

New architecture: haproxy (port N) → middleware (port N+2) → SGLang (port N+1)
The middleware is a thin FastAPI app that strips incompatible params from
chat completion request bodies and passes everything else through unchanged.

2026-04-12 18:58:37 +00:00

2 Commits