Initial chat template debugger - vLLM raw token inspector
This commit is contained in:
69
README.md
Normal file
69
README.md
Normal file
@@ -0,0 +1,69 @@
|
||||
# Chat Template Debugger
|
||||
|
||||
Isolate whether tool-call failures are a **model problem** or a **parser/template problem**.
|
||||
|
||||
Runs vLLM inside Docker, bypasses all OpenClaw middlewares, and captures raw token output from the model directly.
|
||||
|
||||
## The Problem
|
||||
|
||||
90% of models break on streaming tool calls. Is it the model generating garbage, or is something in the middleware stack mangling the output? This debugger lets us answer that definitively.
|
||||
|
||||
## Plan of Attack
|
||||
|
||||
### 1. Build & Run the Container
|
||||
|
||||
```bash
|
||||
docker build -t ct-debug .
|
||||
docker run --gpus all -v $(pwd)/scripts:/workspace/scripts -v $(pwd)/models:/workspace/models -it ct-debug
|
||||
```
|
||||
|
||||
### 2. Stage 0 — Download Weights (if not mounted)
|
||||
|
||||
```bash
|
||||
# Inside the container:
|
||||
python /workspace/scripts/stage0_download.py
|
||||
```
|
||||
|
||||
This downloads `HuggingFaceTB/SmolLM3-3B` to `/workspace/models/SmolLM3-3B` if it doesn't already exist.
|
||||
|
||||
### 3. Stage 1 — Run the Debugger
|
||||
|
||||
Edit `scripts/stage1_debug.py` to point at the model path and your test prompt. Then:
|
||||
|
||||
```bash
|
||||
# Inside the container:
|
||||
python /workspace/scripts/stage1_debug.py
|
||||
```
|
||||
|
||||
This runs the model with a raw prompt (no chat template applied by vLLM's serving layer — you control the prompt string directly). It dumps:
|
||||
|
||||
- The raw generated text
|
||||
- The actual token IDs
|
||||
- A per-token decode so you can see exactly what the model emitted
|
||||
|
||||
### 4. Analyze
|
||||
|
||||
- If the model emits correct tool-call tokens → **parser/template problem**
|
||||
- If the model emits garbage or broken tokens → **model problem**, go fix the LoRA/chat template
|
||||
|
||||
## Directory Layout
|
||||
|
||||
```
|
||||
chat-template-debugger/
|
||||
├── Dockerfile
|
||||
├── README.md
|
||||
├── models/ # Downloaded weights (gitignored)
|
||||
├── scripts/
|
||||
│ ├── stage0_download.py
|
||||
│ └── stage1_debug.py
|
||||
└── prompts/
|
||||
└── smol_tool_call.txt
|
||||
```
|
||||
|
||||
## Swapping Models
|
||||
|
||||
Change `MODEL_ID` in `stage0_download.py` and `MODEL_PATH` in `stage1_debug.py`. Works with any HF model.
|
||||
|
||||
## Swapping Prompts
|
||||
|
||||
Drop a `.txt` file in `prompts/` and update the path in `stage1_debug.py`. The prompt is passed as a raw string — no chat template is applied by vLLM. You control the full context.
|
||||
Reference in New Issue
Block a user