Initial chat template debugger - vLLM raw token inspector

2026-04-10 15:28:41 +00:00
commit c981416dde
6 changed files with 184 additions and 0 deletions
--- a/README.md
+++ b/README.md
@@ -0,0 +1,69 @@
+# Chat Template Debugger
+
+Isolate whether tool-call failures are a **model problem** or a **parser/template problem**.
+
+Runs vLLM inside Docker, bypasses all OpenClaw middlewares, and captures raw token output from the model directly.
+
+## The Problem
+
+90% of models break on streaming tool calls. Is it the model generating garbage, or is something in the middleware stack mangling the output? This debugger lets us answer that definitively.
+
+## Plan of Attack
+
+### 1. Build & Run the Container
+
+```bash
+docker build -t ct-debug .
+docker run --gpus all -v $(pwd)/scripts:/workspace/scripts -v $(pwd)/models:/workspace/models -it ct-debug
+```
+
+### 2. Stage 0 — Download Weights (if not mounted)
+
+```bash
+# Inside the container:
+python /workspace/scripts/stage0_download.py
+```
+
+This downloads `HuggingFaceTB/SmolLM3-3B` to `/workspace/models/SmolLM3-3B` if it doesn't already exist.
+
+### 3. Stage 1 — Run the Debugger
+
+Edit `scripts/stage1_debug.py` to point at the model path and your test prompt. Then:
+
+```bash
+# Inside the container:
+python /workspace/scripts/stage1_debug.py
+```
+
+This runs the model with a raw prompt (no chat template applied by vLLM's serving layer — you control the prompt string directly). It dumps:
+
+- The raw generated text
+- The actual token IDs
+- A per-token decode so you can see exactly what the model emitted
+
+### 4. Analyze
+
+- If the model emits correct tool-call tokens → **parser/template problem**
+- If the model emits garbage or broken tokens → **model problem**, go fix the LoRA/chat template
+
+## Directory Layout
+
+```
+chat-template-debugger/
+├── Dockerfile
+├── README.md
+├── models/              # Downloaded weights (gitignored)
+├── scripts/
+│   ├── stage0_download.py
+│   └── stage1_debug.py
+└── prompts/
+    └── smol_tool_call.txt
+```
+
+## Swapping Models
+
+Change `MODEL_ID` in `stage0_download.py` and `MODEL_PATH` in `stage1_debug.py`. Works with any HF model.
+
+## Swapping Prompts
+
+Drop a `.txt` file in `prompts/` and update the path in `stage1_debug.py`. The prompt is passed as a raw string — no chat template is applied by vLLM. You control the full context.