Chat Template Debugger

Isolate whether tool-call failures are a model problem or a parser/template problem.

Runs vLLM inside Docker, bypasses all OpenClaw middlewares, and captures raw token output from the model directly.

The Problem

90% of models break on streaming tool calls. Is it the model generating garbage, or is something in the middleware stack mangling the output? This debugger lets us answer that definitively.

Plan of Attack

1. Build & Run the Container

docker build -t ct-debug .
docker run --gpus all -v $(pwd)/scripts:/workspace/scripts -v $(pwd)/models:/workspace/models -it ct-debug

2. Stage 0 — Download Weights (if not mounted)

# Inside the container:
python /workspace/scripts/stage0_download.py

This downloads HuggingFaceTB/SmolLM3-3B to /workspace/models/SmolLM3-3B if it doesn't already exist.

3. Stage 1 — Run the Debugger

Edit scripts/stage1_debug.py to point at the model path and your test prompt. Then:

# Inside the container:
python /workspace/scripts/stage1_debug.py

This runs the model with a raw prompt (no chat template applied by vLLM's serving layer — you control the prompt string directly). It dumps:

  • The raw generated text
  • The actual token IDs
  • A per-token decode so you can see exactly what the model emitted

4. Analyze

  • If the model emits correct tool-call tokens → parser/template problem
  • If the model emits garbage or broken tokens → model problem, go fix the LoRA/chat template

Directory Layout

chat-template-debugger/
├── Dockerfile
├── README.md
├── models/              # Downloaded weights (gitignored)
├── scripts/
│   ├── stage0_download.py
│   └── stage1_debug.py
└── prompts/
    └── smol_tool_call.txt

Swapping Models

Change MODEL_ID in stage0_download.py and MODEL_PATH in stage1_debug.py. Works with any HF model.

Swapping Prompts

Drop a .txt file in prompts/ and update the path in stage1_debug.py. The prompt is passed as a raw string — no chat template is applied by vLLM. You control the full context.

Description
No description provided
Readme 34 KiB
Languages
Python 97.8%
Dockerfile 2.2%