RUNBOOK.md

# SmolLM3-3B LoRA Training — Deployment Runbook

## Objective

Train a LoRA adapter that teaches SmolLM3-3B to emit native tool-call tokens
(IDs 128015/128016) instead of code-dumping. See `TRAINING_PLAN.md` for the
full strategy.

## Prerequisites

- [ ] GPU server accessible via SSH (`root@107.191.43.158`)
- [ ] Docker + NVIDIA Container Toolkit installed
- [ ] This repo cloned at `/root/smollora` on the GPU server

## Step 1: Sync the Code

From the OpenClaw workspace, push any changes:
```bash
cd /home/openclaw/dev/smollora
git add -A && git commit -m "updates" && git push
```

On the GPU server, pull the latest:
```bash
ssh root@107.191.43.158
cd /root/smollora && git pull
```

> **Rule:** Always mutate code on the OpenClaw side, push, then pull on the GPU server.
> Never edit files directly on the server — changes won't propagate back.

## Step 2: Verify the Data Prep Will Produce Correct Tokens

Before training, confirm the processed data will contain token IDs 128015/128016.
After data prep runs (or as a dry run), check:

```python
from transformers import AutoTokenizer
import json

tokenizer = AutoTokenizer.from_pretrained("HuggingFaceTB/SmolLM3-3B", trust_remote_code=True)

with open("/data/processed/train.jsonl") as f:
    sample = json.loads(f.readline())

text = tokenizer.apply_chat_template(sample["messages"], tokenize=False)
ids = tokenizer.encode(text)

assert 128015 in ids, "Tool call start token (128015) missing — data prep is broken!"
assert 128016 in ids, "Tool call end token (128016) missing — data prep is broken!"
print(f"✓ Token IDs verified. Sample has {len(ids)} tokens, tool-call tokens present.")
```

If this fails, do NOT proceed to training. Fix `prepare_data.py` or the tokenizer first.

## Step 3: Build & Run

```bash
cd /root/smollora
docker compose build
docker compose up -d
```

Verify it's running:
```bash
docker compose ps
docker compose logs --tail=20
```

## Step 4: Exec In & Run Training

```bash
docker compose exec smollora bash
```

Inside the container:

```bash
# Full pipeline (data prep + train)
/app/run.sh

# Or skip data prep if already done
SKIP_PREP=1 /app/run.sh
```

### Key Training Parameters for This Run

These should be set in `run.sh` or passed as env vars:

| Param | Value | Notes |
|-------|-------|-------|
| `MODEL` | `HuggingFaceTB/SmolLM3-3B` | Base model |
| `EPOCHS` | `3` | Increase to 5 if val loss still dropping |
| `LR` | `2e-4` | Drop to 1e-4 if loss spikes |
| `LORA_R` | `16` | Bump to 32 if loss plateaus |
| `BATCH_SIZE` | `4` | Reduce to 2 if OOM |
| `MAX_LENGTH` | `4096` | Enough for tool calls + code |

**Critical:** `embed_tokens` MUST be in the LoRA target modules. Verify in
`train_lora.py` that `target_modules` includes `"embed_tokens"`. Without it,
the adapter can't adjust the tool-call token embeddings and the model won't
learn to emit them.

## Step 5: Monitor

From the host:
```bash
docker compose logs -f
watch -n5 nvidia-smi
```

Watch for:
- ✅ Training loss decreasing steadily
- ✅ Val loss decreasing (not diverging from train loss)
- ✅ No CUDA OOM
- ❌ Val loss increasing while train loss decreases = overfitting → reduce epochs or add more data

Expected timeline on a single A100:
- Data prep: ~10-20 min
- Training (3 epochs, ~15k samples): ~1-3 hours

## Step 6: Validate — Raw Token Emission Test

**Do not deploy until this passes.**

1. Merge the LoRA adapter into the base model:
   ```python
   from peft import PeftModel
   from transformers import AutoModelForCausalLM, AutoTokenizer

   base = AutoModelForCausalLM.from_pretrained("HuggingFaceTB/SmolLM3-3B", trust_remote_code=True)
   tokenizer = AutoTokenizer.from_pretrained("HuggingFaceTB/SmolLM3-3B", trust_remote_code=True)
   model = PeftModel.from_pretrained(base, "/data/lora-output/final")
   merged = model.merge_and_unload()
   merged.save_pretrained("/data/merged-model")
   tokenizer.save_pretrained("/data/merged-model")
   ```

2. Copy the merged model to the chat-template-debugger:
   ```bash
   cp -r /data/merged-model /root/chat-template-debugger/models/SmolLM3-3B-toolcall
   ```

3. Run the raw token debugger (stage 1):
   ```bash
   docker exec -e MODEL_PATH=/workspace/models/SmolLM3-3B-toolcall \
     -e PROMPT_FILE=/workspace/prompts/smol_write_file.txt \
     ct-debug-run python3 /workspace/scripts/stage1_debug.py
   ```

4. **Pass criteria:**
   - Token IDs **128015** and **128016** appear in the output
   - Valid JSON follows token 128015
   - No Python code-dumping

5. Also test with `smol_save_config.txt` prompt — same criteria.

If the model still code-dumps, the training didn't work. Check:
- Were tokens 128015/128016 in the training data? (Step 2)
- Is `embed_tokens` in the LoRA targets?
- Was there enough data / enough epochs?

## Step 7: Deploy to vLLM

Once validation passes:

1. Copy the merged model to the vLLM model directory
2. Update the vLLM docker-compose to point at the merged model
3. Restart vLLM
4. Run the streaming tool call tests from `/home/openclaw/dev/model-tool-tests`

## Troubleshooting

| Problem | Fix |
|---------|-----|
| CUDA OOM | Reduce `BATCH_SIZE` to 2, increase `GRAD_ACCUM` to 8, or reduce `MAX_LENGTH` to 2048 |
| Dataset download fails | Check internet; can pre-download and mount into `/data` |
| Docker can't see GPU | `sudo apt-get install -y nvidia-container-toolkit && sudo systemctl restart docker` |
| Training loss not decreasing | Check LR; verify labels aren't all -100; verify token IDs in data (Step 2) |
| Model still code-dumps after training | Verify embed_tokens in targets; try more epochs; try lora_r=32 |
| Model emits tokens but broken JSON | Need more diverse tool-call samples; increase max_length |
| Model emits tool tokens for everything | Overfit — add 30% non-tool instruction data to training mix |
| Disk full | Clean up `/srv/smollora/hf-cache` after model loads |

## Rollback

```bash
docker compose down
rm -rf /srv/smollora/output/*
# Fix whatever broke, then:
docker compose up -d
```
Add deployment runbook 2026-04-10 05:28:30 +00:00			`# SmolLM3-3B LoRA Training — Deployment Runbook`

Update runbook for tool-call token training run 2026-04-10 17:14:57 +00:00			`## Objective`

			`Train a LoRA adapter that teaches SmolLM3-3B to emit native tool-call tokens`
			(IDs 128015/128016) instead of code-dumping. See `TRAINING_PLAN.md` for the
			`full strategy.`

Add deployment runbook 2026-04-10 05:28:30 +00:00			`## Prerequisites`

Update runbook for tool-call token training run 2026-04-10 17:14:57 +00:00			- [ ] GPU server accessible via SSH (`root@107.191.43.158`)
			`- [ ] Docker + NVIDIA Container Toolkit installed`
			- [ ] This repo cloned at `/root/smollora` on the GPU server
Add deployment runbook 2026-04-10 05:28:30 +00:00
Update runbook for tool-call token training run 2026-04-10 17:14:57 +00:00			`## Step 1: Sync the Code`
Add deployment runbook 2026-04-10 05:28:30 +00:00
Update runbook for tool-call token training run 2026-04-10 17:14:57 +00:00			`From the OpenClaw workspace, push any changes:`
Add deployment runbook 2026-04-10 05:28:30 +00:00			```bash
Update runbook for tool-call token training run 2026-04-10 17:14:57 +00:00			`cd /home/openclaw/dev/smollora`
			`git add -A && git commit -m "updates" && git push`
Add deployment runbook 2026-04-10 05:28:30 +00:00			```

Update runbook for tool-call token training run 2026-04-10 17:14:57 +00:00			`On the GPU server, pull the latest:`
Add deployment runbook 2026-04-10 05:28:30 +00:00			```bash
Update runbook for tool-call token training run 2026-04-10 17:14:57 +00:00			`ssh root@107.191.43.158`
			`cd /root/smollora && git pull`
Add deployment runbook 2026-04-10 05:28:30 +00:00			```

Update runbook for tool-call token training run 2026-04-10 17:14:57 +00:00			`> Rule: Always mutate code on the OpenClaw side, push, then pull on the GPU server.`
			`> Never edit files directly on the server — changes won't propagate back.`
Add deployment runbook 2026-04-10 05:28:30 +00:00
Update runbook for tool-call token training run 2026-04-10 17:14:57 +00:00			`## Step 2: Verify the Data Prep Will Produce Correct Tokens`
Add deployment runbook 2026-04-10 05:28:30 +00:00
Update runbook for tool-call token training run 2026-04-10 17:14:57 +00:00			`Before training, confirm the processed data will contain token IDs 128015/128016.`
			`After data prep runs (or as a dry run), check:`
Add deployment runbook 2026-04-10 05:28:30 +00:00
Update runbook for tool-call token training run 2026-04-10 17:14:57 +00:00			```python
			`from transformers import AutoTokenizer`
			`import json`
Add deployment runbook 2026-04-10 05:28:30 +00:00
Update runbook for tool-call token training run 2026-04-10 17:14:57 +00:00			`tokenizer = AutoTokenizer.from_pretrained("HuggingFaceTB/SmolLM3-3B", trust_remote_code=True)`
Add deployment runbook 2026-04-10 05:28:30 +00:00
Update runbook for tool-call token training run 2026-04-10 17:14:57 +00:00			`with open("/data/processed/train.jsonl") as f:`
			`sample = json.loads(f.readline())`

			`text = tokenizer.apply_chat_template(sample["messages"], tokenize=False)`
			`ids = tokenizer.encode(text)`

			`assert 128015 in ids, "Tool call start token (128015) missing — data prep is broken!"`
			`assert 128016 in ids, "Tool call end token (128016) missing — data prep is broken!"`
			`print(f"✓ Token IDs verified. Sample has {len(ids)} tokens, tool-call tokens present.")`
Add deployment runbook 2026-04-10 05:28:30 +00:00			```

Update runbook for tool-call token training run 2026-04-10 17:14:57 +00:00			If this fails, do NOT proceed to training. Fix `prepare_data.py` or the tokenizer first.

			`## Step 3: Build & Run`
Add deployment runbook 2026-04-10 05:28:30 +00:00
			```bash
Update runbook for tool-call token training run 2026-04-10 17:14:57 +00:00			`cd /root/smollora`
Add deployment runbook 2026-04-10 05:28:30 +00:00			`docker compose build`
			`docker compose up -d`
			```

			`Verify it's running:`
			```bash
			`docker compose ps`
			`docker compose logs --tail=20`
			```

Update runbook for tool-call token training run 2026-04-10 17:14:57 +00:00			`## Step 4: Exec In & Run Training`
Add deployment runbook 2026-04-10 05:28:30 +00:00
			```bash
			`docker compose exec smollora bash`
			```

Update runbook for tool-call token training run 2026-04-10 17:14:57 +00:00			`Inside the container:`

Add deployment runbook 2026-04-10 05:28:30 +00:00			```bash
Update runbook for tool-call token training run 2026-04-10 17:14:57 +00:00			`# Full pipeline (data prep + train)`
Add deployment runbook 2026-04-10 05:28:30 +00:00			`/app/run.sh`

Update runbook for tool-call token training run 2026-04-10 17:14:57 +00:00			`# Or skip data prep if already done`
Add deployment runbook 2026-04-10 05:28:30 +00:00			`SKIP_PREP=1 /app/run.sh`
			```

Update runbook for tool-call token training run 2026-04-10 17:14:57 +00:00			`### Key Training Parameters for This Run`

			These should be set in `run.sh` or passed as env vars:
Add deployment runbook 2026-04-10 05:28:30 +00:00
Update runbook for tool-call token training run 2026-04-10 17:14:57 +00:00			`\| Param \| Value \| Notes \|`
			`\|-------\|-------\|-------\|`
			\| `MODEL` \| `HuggingFaceTB/SmolLM3-3B` \| Base model \|
			\| `EPOCHS` \| `3` \| Increase to 5 if val loss still dropping \|
			\| `LR` \| `2e-4` \| Drop to 1e-4 if loss spikes \|
			\| `LORA_R` \| `16` \| Bump to 32 if loss plateaus \|
			\| `BATCH_SIZE` \| `4` \| Reduce to 2 if OOM \|
			\| `MAX_LENGTH` \| `4096` \| Enough for tool calls + code \|

			Critical: `embed_tokens` MUST be in the LoRA target modules. Verify in
			`train_lora.py` that `target_modules` includes `"embed_tokens"`. Without it,
			`the adapter can't adjust the tool-call token embeddings and the model won't`
			`learn to emit them.`

			`## Step 5: Monitor`

			`From the host:`
Add deployment runbook 2026-04-10 05:28:30 +00:00			```bash
			`docker compose logs -f`
			`watch -n5 nvidia-smi`
			```

Update runbook for tool-call token training run 2026-04-10 17:14:57 +00:00			`Watch for:`
			`- ✅ Training loss decreasing steadily`
			`- ✅ Val loss decreasing (not diverging from train loss)`
			`- ✅ No CUDA OOM`
			`- ❌ Val loss increasing while train loss decreases = overfitting → reduce epochs or add more data`

Add deployment runbook 2026-04-10 05:28:30 +00:00			`Expected timeline on a single A100:`
			`- Data prep: ~10-20 min`
Update runbook for tool-call token training run 2026-04-10 17:14:57 +00:00			`- Training (3 epochs, ~15k samples): ~1-3 hours`
Add deployment runbook 2026-04-10 05:28:30 +00:00
Update runbook for tool-call token training run 2026-04-10 17:14:57 +00:00			`## Step 6: Validate — Raw Token Emission Test`
Add deployment runbook 2026-04-10 05:28:30 +00:00
Update runbook for tool-call token training run 2026-04-10 17:14:57 +00:00			`Do not deploy until this passes.`

			`1. Merge the LoRA adapter into the base model:`
			```python
			`from peft import PeftModel`
			`from transformers import AutoModelForCausalLM, AutoTokenizer`

			`base = AutoModelForCausalLM.from_pretrained("HuggingFaceTB/SmolLM3-3B", trust_remote_code=True)`
			`tokenizer = AutoTokenizer.from_pretrained("HuggingFaceTB/SmolLM3-3B", trust_remote_code=True)`
			`model = PeftModel.from_pretrained(base, "/data/lora-output/final")`
			`merged = model.merge_and_unload()`
			`merged.save_pretrained("/data/merged-model")`
			`tokenizer.save_pretrained("/data/merged-model")`
			```

			`2. Copy the merged model to the chat-template-debugger:`
			```bash
			`cp -r /data/merged-model /root/chat-template-debugger/models/SmolLM3-3B-toolcall`
			```

			`3. Run the raw token debugger (stage 1):`
			```bash
			`docker exec -e MODEL_PATH=/workspace/models/SmolLM3-3B-toolcall \`
			`-e PROMPT_FILE=/workspace/prompts/smol_write_file.txt \`
			`ct-debug-run python3 /workspace/scripts/stage1_debug.py`
			```

			`4. Pass criteria:`
			`- Token IDs 128015 and 128016 appear in the output`
			`- Valid JSON follows token 128015`
			`- No Python code-dumping`

			5. Also test with `smol_save_config.txt` prompt — same criteria.

			`If the model still code-dumps, the training didn't work. Check:`
			`- Were tokens 128015/128016 in the training data? (Step 2)`
			- Is `embed_tokens` in the LoRA targets?
			`- Was there enough data / enough epochs?`

			`## Step 7: Deploy to vLLM`
Add deployment runbook 2026-04-10 05:28:30 +00:00
Update runbook for tool-call token training run 2026-04-10 17:14:57 +00:00			`Once validation passes:`
Add deployment runbook 2026-04-10 05:28:30 +00:00
Update runbook for tool-call token training run 2026-04-10 17:14:57 +00:00			`1. Copy the merged model to the vLLM model directory`
			`2. Update the vLLM docker-compose to point at the merged model`
			`3. Restart vLLM`
			4. Run the streaming tool call tests from `/home/openclaw/dev/model-tool-tests`
Add deployment runbook 2026-04-10 05:28:30 +00:00
			`## Troubleshooting`

			`\| Problem \| Fix \|`
			`\|---------\|-----\|`
			\| CUDA OOM \| Reduce `BATCH_SIZE` to 2, increase `GRAD_ACCUM` to 8, or reduce `MAX_LENGTH` to 2048 \|
			\| Dataset download fails \| Check internet; can pre-download and mount into `/data` \|
Update runbook for tool-call token training run 2026-04-10 17:14:57 +00:00			\| Docker can't see GPU \| `sudo apt-get install -y nvidia-container-toolkit && sudo systemctl restart docker` \|
			`\| Training loss not decreasing \| Check LR; verify labels aren't all -100; verify token IDs in data (Step 2) \|`
			`\| Model still code-dumps after training \| Verify embed_tokens in targets; try more epochs; try lora_r=32 \|`
			`\| Model emits tokens but broken JSON \| Need more diverse tool-call samples; increase max_length \|`
			`\| Model emits tool tokens for everything \| Overfit — add 30% non-tool instruction data to training mix \|`
			\| Disk full \| Clean up `/srv/smollora/hf-cache` after model loads \|
Add deployment runbook 2026-04-10 05:28:30 +00:00
			`## Rollback`

			```bash
			`docker compose down`
			`rm -rf /srv/smollora/output/*`
			`# Fix whatever broke, then:`
			`docker compose up -d`
			```