Files

Jinx f46995690c Update runbook for tool-call token training run

2026-04-10 17:14:57 +00:00

5.9 KiB

Raw Permalink Blame History

SmolLM3-3B LoRA Training — Deployment Runbook

Objective

Train a LoRA adapter that teaches SmolLM3-3B to emit native tool-call tokens (IDs 128015/128016) instead of code-dumping. See TRAINING_PLAN.md for the full strategy.

Prerequisites

GPU server accessible via SSH (root@107.191.43.158)
Docker + NVIDIA Container Toolkit installed
This repo cloned at /root/smollora on the GPU server

Step 1: Sync the Code

From the OpenClaw workspace, push any changes:

cd /home/openclaw/dev/smollora
git add -A && git commit -m "updates" && git push

On the GPU server, pull the latest:

ssh root@107.191.43.158
cd /root/smollora && git pull

Rule: Always mutate code on the OpenClaw side, push, then pull on the GPU server. Never edit files directly on the server — changes won't propagate back.

Step 2: Verify the Data Prep Will Produce Correct Tokens

Before training, confirm the processed data will contain token IDs 128015/128016. After data prep runs (or as a dry run), check:

from transformers import AutoTokenizer
import json

tokenizer = AutoTokenizer.from_pretrained("HuggingFaceTB/SmolLM3-3B", trust_remote_code=True)

with open("/data/processed/train.jsonl") as f:
    sample = json.loads(f.readline())

text = tokenizer.apply_chat_template(sample["messages"], tokenize=False)
ids = tokenizer.encode(text)

assert 128015 in ids, "Tool call start token (128015) missing — data prep is broken!"
assert 128016 in ids, "Tool call end token (128016) missing — data prep is broken!"
print(f"✓ Token IDs verified. Sample has {len(ids)} tokens, tool-call tokens present.")

If this fails, do NOT proceed to training. Fix prepare_data.py or the tokenizer first.

Step 3: Build & Run

cd /root/smollora
docker compose build
docker compose up -d

Verify it's running:

docker compose ps
docker compose logs --tail=20

Step 4: Exec In & Run Training

docker compose exec smollora bash

Inside the container:

# Full pipeline (data prep + train)
/app/run.sh

# Or skip data prep if already done
SKIP_PREP=1 /app/run.sh

Key Training Parameters for This Run

These should be set in run.sh or passed as env vars:

Param	Value	Notes
`MODEL`	`HuggingFaceTB/SmolLM3-3B`	Base model
`EPOCHS`	`3`	Increase to 5 if val loss still dropping
`LR`	`2e-4`	Drop to 1e-4 if loss spikes
`LORA_R`	`16`	Bump to 32 if loss plateaus
`BATCH_SIZE`	`4`	Reduce to 2 if OOM
`MAX_LENGTH`	`4096`	Enough for tool calls + code

Critical: embed_tokens MUST be in the LoRA target modules. Verify in train_lora.py that target_modules includes "embed_tokens". Without it, the adapter can't adjust the tool-call token embeddings and the model won't learn to emit them.

Step 5: Monitor

From the host:

docker compose logs -f
watch -n5 nvidia-smi

Watch for:

✅ Training loss decreasing steadily
✅ Val loss decreasing (not diverging from train loss)
✅ No CUDA OOM
❌ Val loss increasing while train loss decreases = overfitting → reduce epochs or add more data

Expected timeline on a single A100:

Data prep: ~10-20 min
Training (3 epochs, ~15k samples): ~1-3 hours

Step 6: Validate — Raw Token Emission Test

Do not deploy until this passes.

Merge the LoRA adapter into the base model:

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

base = AutoModelForCausalLM.from_pretrained("HuggingFaceTB/SmolLM3-3B", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("HuggingFaceTB/SmolLM3-3B", trust_remote_code=True)
model = PeftModel.from_pretrained(base, "/data/lora-output/final")
merged = model.merge_and_unload()
merged.save_pretrained("/data/merged-model")
tokenizer.save_pretrained("/data/merged-model")

Copy the merged model to the chat-template-debugger:

cp -r /data/merged-model /root/chat-template-debugger/models/SmolLM3-3B-toolcall

Run the raw token debugger (stage 1):

docker exec -e MODEL_PATH=/workspace/models/SmolLM3-3B-toolcall \
  -e PROMPT_FILE=/workspace/prompts/smol_write_file.txt \
  ct-debug-run python3 /workspace/scripts/stage1_debug.py

Pass criteria:
- Token IDs 128015 and 128016 appear in the output
- Valid JSON follows token 128015
- No Python code-dumping
Also test with smol_save_config.txt prompt — same criteria.

If the model still code-dumps, the training didn't work. Check:

Were tokens 128015/128016 in the training data? (Step 2)
Is embed_tokens in the LoRA targets?
Was there enough data / enough epochs?

Step 7: Deploy to vLLM

Once validation passes:

Copy the merged model to the vLLM model directory
Update the vLLM docker-compose to point at the merged model
Restart vLLM
Run the streaming tool call tests from /home/openclaw/dev/model-tool-tests

Troubleshooting

Problem	Fix
CUDA OOM	Reduce `BATCH_SIZE` to 2, increase `GRAD_ACCUM` to 8, or reduce `MAX_LENGTH` to 2048
Dataset download fails	Check internet; can pre-download and mount into `/data`
Docker can't see GPU	`sudo apt-get install -y nvidia-container-toolkit && sudo systemctl restart docker`
Training loss not decreasing	Check LR; verify labels aren't all -100; verify token IDs in data (Step 2)
Model still code-dumps after training	Verify embed_tokens in targets; try more epochs; try lora_r=32
Model emits tokens but broken JSON	Need more diverse tool-call samples; increase max_length
Model emits tool tokens for everything	Overfit — add 30% non-tool instruction data to training mix
Disk full	Clean up `/srv/smollora/hf-cache` after model loads

Rollback

docker compose down
rm -rf /srv/smollora/output/*
# Fix whatever broke, then:
docker compose up -d

5.9 KiB Raw Permalink Blame History