Go to file

biondizzle 7fb373fdfc Add haproxy proxy: /metrics returns 200 empty, everything else proxies to SGLang

SGLang now runs on port+1, haproxy binds the original vLLM port.
haproxy serves a stub /metrics endpoint (200, empty body) and
passes all other traffic through to SGLang via raw TCP proxy.

2026-04-12 17:09:58 +00:00

Dockerfile

Add haproxy proxy: /metrics returns 200 empty, everything else proxies to SGLang

2026-04-12 17:09:58 +00:00

entrypoint.sh

init commit

2026-04-11 23:39:36 +00:00

Jenkinsfile

Fix Jenkinsfile: agent any, nightly default, proper quoting

2026-04-12 00:22:29 +00:00

README.md

Rewrite README: explain the shim, current state, and how to adapt for other models

2026-04-12 03:07:43 +00:00

vllm_shim_module.py

Add haproxy proxy: /metrics returns 200 empty, everything else proxies to SGLang

2026-04-12 17:09:58 +00:00

vllm-shim.sh

Add haproxy proxy: /metrics returns 200 empty, everything else proxies to SGLang

2026-04-12 17:09:58 +00:00

README.md

vLLM → SGLang Shim

Drop-in replacement that makes a vLLM production stack (e.g. the k8s operator) actually run SGLang instead.

Why?

The vLLM production stack handles model lifecycle, scaling, and routing — but some models work better (or only work) on SGLang. Rather than rewriting your deployment infra, this shim intercepts every vLLM invocation and launches SGLang with equivalent arguments.

How It Works

Two interception paths:

What the stack calls	What happens
`vllm serve <model> [flags]`	Shell shim (`vllm-shim.sh`) parses args, execs `python -m sglang.launch_server`
`python -m vllm.entrypoints.openai.api_server`	Python shim (shadow module on `PYTHONPATH`) does the same

Both extract --host and --port from whatever the stack sends and forward them to SGLang. Everything else is currently hardcoded for the target model.

Current State

PoC — hardcoded for mistralai/Devstral-2-123B-Instruct-2512 on 8× MI300X.

Model path, --tp 8, and --tool-call-parser mistral are baked into both shims
The Dockerfile builds on lmsysorg/sglang-rocm and patches a broken aiter build from the base image
MI300X tuning env vars are set (HIP_FORCE_DEV_KERNARG, NCCL_MIN_NCHANNELS, etc.)

Building

docker build -t vllm-to-sglang .

Then use this image anywhere the vLLM stack expects its server image.

Making It Work For Other Models

Right now the model config is hardcoded in three places:

vllm-shim.sh — the exec python -m sglang.launch_server line
vllm_shim_module.py — the os.execvp() call
Dockerfile — base image and ROCm-specific patches

To adapt for a different model, change --model-path, --tp, and --tool-call-parser in both shim files. A future pass will make this configurable via env vars or args so you don't have to edit source.

Files

File	Purpose
`Dockerfile`	Builds the image: ROCm SGLang base + aiter fix + shims + MI300X env
`vllm-shim.sh`	Shell shim — replaces the `vllm` binary
`vllm_shim_module.py`	Python shim — shadows `vllm.*` module imports

README.md Unescape Escape

vLLM → SGLang Shim

Why?

How It Works

Current State

Building

Making It Work For Other Models

Files

README.md