5.9 KiB
SmolLM3-3B LoRA Training — Deployment Runbook
Objective
Train a LoRA adapter that teaches SmolLM3-3B to emit native tool-call tokens
(IDs 128015/128016) instead of code-dumping. See TRAINING_PLAN.md for the
full strategy.
Prerequisites
- GPU server accessible via SSH (
root@107.191.43.158) - Docker + NVIDIA Container Toolkit installed
- This repo cloned at
/root/smolloraon the GPU server
Step 1: Sync the Code
From the OpenClaw workspace, push any changes:
cd /home/openclaw/dev/smollora
git add -A && git commit -m "updates" && git push
On the GPU server, pull the latest:
ssh root@107.191.43.158
cd /root/smollora && git pull
Rule: Always mutate code on the OpenClaw side, push, then pull on the GPU server. Never edit files directly on the server — changes won't propagate back.
Step 2: Verify the Data Prep Will Produce Correct Tokens
Before training, confirm the processed data will contain token IDs 128015/128016. After data prep runs (or as a dry run), check:
from transformers import AutoTokenizer
import json
tokenizer = AutoTokenizer.from_pretrained("HuggingFaceTB/SmolLM3-3B", trust_remote_code=True)
with open("/data/processed/train.jsonl") as f:
sample = json.loads(f.readline())
text = tokenizer.apply_chat_template(sample["messages"], tokenize=False)
ids = tokenizer.encode(text)
assert 128015 in ids, "Tool call start token (128015) missing — data prep is broken!"
assert 128016 in ids, "Tool call end token (128016) missing — data prep is broken!"
print(f"✓ Token IDs verified. Sample has {len(ids)} tokens, tool-call tokens present.")
If this fails, do NOT proceed to training. Fix prepare_data.py or the tokenizer first.
Step 3: Build & Run
cd /root/smollora
docker compose build
docker compose up -d
Verify it's running:
docker compose ps
docker compose logs --tail=20
Step 4: Exec In & Run Training
docker compose exec smollora bash
Inside the container:
# Full pipeline (data prep + train)
/app/run.sh
# Or skip data prep if already done
SKIP_PREP=1 /app/run.sh
Key Training Parameters for This Run
These should be set in run.sh or passed as env vars:
| Param | Value | Notes |
|---|---|---|
MODEL |
HuggingFaceTB/SmolLM3-3B |
Base model |
EPOCHS |
3 |
Increase to 5 if val loss still dropping |
LR |
2e-4 |
Drop to 1e-4 if loss spikes |
LORA_R |
16 |
Bump to 32 if loss plateaus |
BATCH_SIZE |
4 |
Reduce to 2 if OOM |
MAX_LENGTH |
4096 |
Enough for tool calls + code |
Critical: embed_tokens MUST be in the LoRA target modules. Verify in
train_lora.py that target_modules includes "embed_tokens". Without it,
the adapter can't adjust the tool-call token embeddings and the model won't
learn to emit them.
Step 5: Monitor
From the host:
docker compose logs -f
watch -n5 nvidia-smi
Watch for:
- ✅ Training loss decreasing steadily
- ✅ Val loss decreasing (not diverging from train loss)
- ✅ No CUDA OOM
- ❌ Val loss increasing while train loss decreases = overfitting → reduce epochs or add more data
Expected timeline on a single A100:
- Data prep: ~10-20 min
- Training (3 epochs, ~15k samples): ~1-3 hours
Step 6: Validate — Raw Token Emission Test
Do not deploy until this passes.
-
Merge the LoRA adapter into the base model:
from peft import PeftModel from transformers import AutoModelForCausalLM, AutoTokenizer base = AutoModelForCausalLM.from_pretrained("HuggingFaceTB/SmolLM3-3B", trust_remote_code=True) tokenizer = AutoTokenizer.from_pretrained("HuggingFaceTB/SmolLM3-3B", trust_remote_code=True) model = PeftModel.from_pretrained(base, "/data/lora-output/final") merged = model.merge_and_unload() merged.save_pretrained("/data/merged-model") tokenizer.save_pretrained("/data/merged-model") -
Copy the merged model to the chat-template-debugger:
cp -r /data/merged-model /root/chat-template-debugger/models/SmolLM3-3B-toolcall -
Run the raw token debugger (stage 1):
docker exec -e MODEL_PATH=/workspace/models/SmolLM3-3B-toolcall \ -e PROMPT_FILE=/workspace/prompts/smol_write_file.txt \ ct-debug-run python3 /workspace/scripts/stage1_debug.py -
Pass criteria:
- Token IDs 128015 and 128016 appear in the output
- Valid JSON follows token 128015
- No Python code-dumping
-
Also test with
smol_save_config.txtprompt — same criteria.
If the model still code-dumps, the training didn't work. Check:
- Were tokens 128015/128016 in the training data? (Step 2)
- Is
embed_tokensin the LoRA targets? - Was there enough data / enough epochs?
Step 7: Deploy to vLLM
Once validation passes:
- Copy the merged model to the vLLM model directory
- Update the vLLM docker-compose to point at the merged model
- Restart vLLM
- Run the streaming tool call tests from
/home/openclaw/dev/model-tool-tests
Troubleshooting
| Problem | Fix |
|---|---|
| CUDA OOM | Reduce BATCH_SIZE to 2, increase GRAD_ACCUM to 8, or reduce MAX_LENGTH to 2048 |
| Dataset download fails | Check internet; can pre-download and mount into /data |
| Docker can't see GPU | sudo apt-get install -y nvidia-container-toolkit && sudo systemctl restart docker |
| Training loss not decreasing | Check LR; verify labels aren't all -100; verify token IDs in data (Step 2) |
| Model still code-dumps after training | Verify embed_tokens in targets; try more epochs; try lora_r=32 |
| Model emits tokens but broken JSON | Need more diverse tool-call samples; increase max_length |
| Model emits tool tokens for everything | Overfit — add 30% non-tool instruction data to training mix |
| Disk full | Clean up /srv/smollora/hf-cache after model loads |
Rollback
docker compose down
rm -rf /srv/smollora/output/*
# Fix whatever broke, then:
docker compose up -d