From 2ac7778c1577cb655694475e3fc909094073f376 Mon Sep 17 00:00:00 2001 From: biondizzle Date: Sun, 12 Apr 2026 03:07:43 +0000 Subject: [PATCH] Rewrite README: explain the shim, current state, and how to adapt for other models --- README.md | 53 +++++++++++++++++++++++++++++++++++++++++++++++++---- 1 file changed, 49 insertions(+), 4 deletions(-) diff --git a/README.md b/README.md index 3923088..ae57e66 100644 --- a/README.md +++ b/README.md @@ -1,7 +1,52 @@ -# vLLM to SGLang +# vLLM → SGLang Shim -vLLM production stack is great, but some things just work out of the box on sglang. +Drop-in replacement that makes a vLLM production stack (e.g. the [k8s operator](https://github.com/vllm-project/production-stack)) actually run [SGLang](https://github.com/sgl-project/sglang) instead. -this just kind of just proxies the bootstrapping of a model deployment in vllm stack to sglang +## Why? -this is a PoC im trying out right now. \ No newline at end of file +The vLLM production stack handles model lifecycle, scaling, and routing — but some models work better (or only work) on SGLang. Rather than rewriting your deployment infra, this shim intercepts every vLLM invocation and launches SGLang with equivalent arguments. + +## How It Works + +Two interception paths: + +| What the stack calls | What happens | +|---|---| +| `vllm serve [flags]` | Shell shim (`vllm-shim.sh`) parses args, execs `python -m sglang.launch_server` | +| `python -m vllm.entrypoints.openai.api_server` | Python shim (shadow module on `PYTHONPATH`) does the same | + +Both extract `--host` and `--port` from whatever the stack sends and forward them to SGLang. Everything else is currently hardcoded for the target model. + +## Current State + +**PoC — hardcoded for `mistralai/Devstral-2-123B-Instruct-2512` on 8× MI300X.** + +- Model path, `--tp 8`, and `--tool-call-parser mistral` are baked into both shims +- The Dockerfile builds on `lmsysorg/sglang-rocm` and patches a broken `aiter` build from the base image +- MI300X tuning env vars are set (`HIP_FORCE_DEV_KERNARG`, `NCCL_MIN_NCHANNELS`, etc.) + +## Building + +```bash +docker build -t vllm-to-sglang . +``` + +Then use this image anywhere the vLLM stack expects its server image. + +## Making It Work For Other Models + +Right now the model config is hardcoded in three places: + +- `vllm-shim.sh` — the `exec python -m sglang.launch_server` line +- `vllm_shim_module.py` — the `os.execvp()` call +- `Dockerfile` — base image and ROCm-specific patches + +To adapt for a different model, change `--model-path`, `--tp`, and `--tool-call-parser` in both shim files. A future pass will make this configurable via env vars or args so you don't have to edit source. + +## Files + +| File | Purpose | +|---|---| +| `Dockerfile` | Builds the image: ROCm SGLang base + aiter fix + shims + MI300X env | +| `vllm-shim.sh` | Shell shim — replaces the `vllm` binary | +| `vllm_shim_module.py` | Python shim — shadows `vllm.*` module imports |