2026-04-10 05:28:30 +00:00
|
|
|
# SmolLM3-3B LoRA Training — Deployment Runbook
|
|
|
|
|
|
2026-04-10 17:14:57 +00:00
|
|
|
## Objective
|
|
|
|
|
|
|
|
|
|
Train a LoRA adapter that teaches SmolLM3-3B to emit native tool-call tokens
|
|
|
|
|
(IDs 128015/128016) instead of code-dumping. See `TRAINING_PLAN.md` for the
|
|
|
|
|
full strategy.
|
|
|
|
|
|
2026-04-10 05:28:30 +00:00
|
|
|
## Prerequisites
|
|
|
|
|
|
2026-04-10 17:14:57 +00:00
|
|
|
- [ ] GPU server accessible via SSH (`root@107.191.43.158`)
|
|
|
|
|
- [ ] Docker + NVIDIA Container Toolkit installed
|
|
|
|
|
- [ ] This repo cloned at `/root/smollora` on the GPU server
|
2026-04-10 05:28:30 +00:00
|
|
|
|
2026-04-10 17:14:57 +00:00
|
|
|
## Step 1: Sync the Code
|
2026-04-10 05:28:30 +00:00
|
|
|
|
2026-04-10 17:14:57 +00:00
|
|
|
From the OpenClaw workspace, push any changes:
|
2026-04-10 05:28:30 +00:00
|
|
|
```bash
|
2026-04-10 17:14:57 +00:00
|
|
|
cd /home/openclaw/dev/smollora
|
|
|
|
|
git add -A && git commit -m "updates" && git push
|
2026-04-10 05:28:30 +00:00
|
|
|
```
|
|
|
|
|
|
2026-04-10 17:14:57 +00:00
|
|
|
On the GPU server, pull the latest:
|
2026-04-10 05:28:30 +00:00
|
|
|
```bash
|
2026-04-10 17:14:57 +00:00
|
|
|
ssh root@107.191.43.158
|
|
|
|
|
cd /root/smollora && git pull
|
2026-04-10 05:28:30 +00:00
|
|
|
```
|
|
|
|
|
|
2026-04-10 17:14:57 +00:00
|
|
|
> **Rule:** Always mutate code on the OpenClaw side, push, then pull on the GPU server.
|
|
|
|
|
> Never edit files directly on the server — changes won't propagate back.
|
2026-04-10 05:28:30 +00:00
|
|
|
|
2026-04-10 17:14:57 +00:00
|
|
|
## Step 2: Verify the Data Prep Will Produce Correct Tokens
|
2026-04-10 05:28:30 +00:00
|
|
|
|
2026-04-10 17:14:57 +00:00
|
|
|
Before training, confirm the processed data will contain token IDs 128015/128016.
|
|
|
|
|
After data prep runs (or as a dry run), check:
|
2026-04-10 05:28:30 +00:00
|
|
|
|
2026-04-10 17:14:57 +00:00
|
|
|
```python
|
|
|
|
|
from transformers import AutoTokenizer
|
|
|
|
|
import json
|
2026-04-10 05:28:30 +00:00
|
|
|
|
2026-04-10 17:14:57 +00:00
|
|
|
tokenizer = AutoTokenizer.from_pretrained("HuggingFaceTB/SmolLM3-3B", trust_remote_code=True)
|
2026-04-10 05:28:30 +00:00
|
|
|
|
2026-04-10 17:14:57 +00:00
|
|
|
with open("/data/processed/train.jsonl") as f:
|
|
|
|
|
sample = json.loads(f.readline())
|
|
|
|
|
|
|
|
|
|
text = tokenizer.apply_chat_template(sample["messages"], tokenize=False)
|
|
|
|
|
ids = tokenizer.encode(text)
|
|
|
|
|
|
|
|
|
|
assert 128015 in ids, "Tool call start token (128015) missing — data prep is broken!"
|
|
|
|
|
assert 128016 in ids, "Tool call end token (128016) missing — data prep is broken!"
|
|
|
|
|
print(f"✓ Token IDs verified. Sample has {len(ids)} tokens, tool-call tokens present.")
|
2026-04-10 05:28:30 +00:00
|
|
|
```
|
|
|
|
|
|
2026-04-10 17:14:57 +00:00
|
|
|
If this fails, do NOT proceed to training. Fix `prepare_data.py` or the tokenizer first.
|
|
|
|
|
|
|
|
|
|
## Step 3: Build & Run
|
2026-04-10 05:28:30 +00:00
|
|
|
|
|
|
|
|
```bash
|
2026-04-10 17:14:57 +00:00
|
|
|
cd /root/smollora
|
2026-04-10 05:28:30 +00:00
|
|
|
docker compose build
|
|
|
|
|
docker compose up -d
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
Verify it's running:
|
|
|
|
|
```bash
|
|
|
|
|
docker compose ps
|
|
|
|
|
docker compose logs --tail=20
|
|
|
|
|
```
|
|
|
|
|
|
2026-04-10 17:14:57 +00:00
|
|
|
## Step 4: Exec In & Run Training
|
2026-04-10 05:28:30 +00:00
|
|
|
|
|
|
|
|
```bash
|
|
|
|
|
docker compose exec smollora bash
|
|
|
|
|
```
|
|
|
|
|
|
2026-04-10 17:14:57 +00:00
|
|
|
Inside the container:
|
|
|
|
|
|
2026-04-10 05:28:30 +00:00
|
|
|
```bash
|
2026-04-10 17:14:57 +00:00
|
|
|
# Full pipeline (data prep + train)
|
2026-04-10 05:28:30 +00:00
|
|
|
/app/run.sh
|
|
|
|
|
|
2026-04-10 17:14:57 +00:00
|
|
|
# Or skip data prep if already done
|
2026-04-10 05:28:30 +00:00
|
|
|
SKIP_PREP=1 /app/run.sh
|
|
|
|
|
```
|
|
|
|
|
|
2026-04-10 17:14:57 +00:00
|
|
|
### Key Training Parameters for This Run
|
|
|
|
|
|
|
|
|
|
These should be set in `run.sh` or passed as env vars:
|
2026-04-10 05:28:30 +00:00
|
|
|
|
2026-04-10 17:14:57 +00:00
|
|
|
| Param | Value | Notes |
|
|
|
|
|
|-------|-------|-------|
|
|
|
|
|
| `MODEL` | `HuggingFaceTB/SmolLM3-3B` | Base model |
|
|
|
|
|
| `EPOCHS` | `3` | Increase to 5 if val loss still dropping |
|
|
|
|
|
| `LR` | `2e-4` | Drop to 1e-4 if loss spikes |
|
|
|
|
|
| `LORA_R` | `16` | Bump to 32 if loss plateaus |
|
|
|
|
|
| `BATCH_SIZE` | `4` | Reduce to 2 if OOM |
|
|
|
|
|
| `MAX_LENGTH` | `4096` | Enough for tool calls + code |
|
|
|
|
|
|
|
|
|
|
**Critical:** `embed_tokens` MUST be in the LoRA target modules. Verify in
|
|
|
|
|
`train_lora.py` that `target_modules` includes `"embed_tokens"`. Without it,
|
|
|
|
|
the adapter can't adjust the tool-call token embeddings and the model won't
|
|
|
|
|
learn to emit them.
|
|
|
|
|
|
|
|
|
|
## Step 5: Monitor
|
|
|
|
|
|
|
|
|
|
From the host:
|
2026-04-10 05:28:30 +00:00
|
|
|
```bash
|
|
|
|
|
docker compose logs -f
|
|
|
|
|
watch -n5 nvidia-smi
|
|
|
|
|
```
|
|
|
|
|
|
2026-04-10 17:14:57 +00:00
|
|
|
Watch for:
|
|
|
|
|
- ✅ Training loss decreasing steadily
|
|
|
|
|
- ✅ Val loss decreasing (not diverging from train loss)
|
|
|
|
|
- ✅ No CUDA OOM
|
|
|
|
|
- ❌ Val loss increasing while train loss decreases = overfitting → reduce epochs or add more data
|
|
|
|
|
|
2026-04-10 05:28:30 +00:00
|
|
|
Expected timeline on a single A100:
|
|
|
|
|
- Data prep: ~10-20 min
|
2026-04-10 17:14:57 +00:00
|
|
|
- Training (3 epochs, ~15k samples): ~1-3 hours
|
2026-04-10 05:28:30 +00:00
|
|
|
|
2026-04-10 17:14:57 +00:00
|
|
|
## Step 6: Validate — Raw Token Emission Test
|
2026-04-10 05:28:30 +00:00
|
|
|
|
2026-04-10 17:14:57 +00:00
|
|
|
**Do not deploy until this passes.**
|
|
|
|
|
|
|
|
|
|
1. Merge the LoRA adapter into the base model:
|
|
|
|
|
```python
|
|
|
|
|
from peft import PeftModel
|
|
|
|
|
from transformers import AutoModelForCausalLM, AutoTokenizer
|
|
|
|
|
|
|
|
|
|
base = AutoModelForCausalLM.from_pretrained("HuggingFaceTB/SmolLM3-3B", trust_remote_code=True)
|
|
|
|
|
tokenizer = AutoTokenizer.from_pretrained("HuggingFaceTB/SmolLM3-3B", trust_remote_code=True)
|
|
|
|
|
model = PeftModel.from_pretrained(base, "/data/lora-output/final")
|
|
|
|
|
merged = model.merge_and_unload()
|
|
|
|
|
merged.save_pretrained("/data/merged-model")
|
|
|
|
|
tokenizer.save_pretrained("/data/merged-model")
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
2. Copy the merged model to the chat-template-debugger:
|
|
|
|
|
```bash
|
|
|
|
|
cp -r /data/merged-model /root/chat-template-debugger/models/SmolLM3-3B-toolcall
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
3. Run the raw token debugger (stage 1):
|
|
|
|
|
```bash
|
|
|
|
|
docker exec -e MODEL_PATH=/workspace/models/SmolLM3-3B-toolcall \
|
|
|
|
|
-e PROMPT_FILE=/workspace/prompts/smol_write_file.txt \
|
|
|
|
|
ct-debug-run python3 /workspace/scripts/stage1_debug.py
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
4. **Pass criteria:**
|
|
|
|
|
- Token IDs **128015** and **128016** appear in the output
|
|
|
|
|
- Valid JSON follows token 128015
|
|
|
|
|
- No Python code-dumping
|
|
|
|
|
|
|
|
|
|
5. Also test with `smol_save_config.txt` prompt — same criteria.
|
|
|
|
|
|
|
|
|
|
If the model still code-dumps, the training didn't work. Check:
|
|
|
|
|
- Were tokens 128015/128016 in the training data? (Step 2)
|
|
|
|
|
- Is `embed_tokens` in the LoRA targets?
|
|
|
|
|
- Was there enough data / enough epochs?
|
|
|
|
|
|
|
|
|
|
## Step 7: Deploy to vLLM
|
2026-04-10 05:28:30 +00:00
|
|
|
|
2026-04-10 17:14:57 +00:00
|
|
|
Once validation passes:
|
2026-04-10 05:28:30 +00:00
|
|
|
|
2026-04-10 17:14:57 +00:00
|
|
|
1. Copy the merged model to the vLLM model directory
|
|
|
|
|
2. Update the vLLM docker-compose to point at the merged model
|
|
|
|
|
3. Restart vLLM
|
|
|
|
|
4. Run the streaming tool call tests from `/home/openclaw/dev/model-tool-tests`
|
2026-04-10 05:28:30 +00:00
|
|
|
|
|
|
|
|
## Troubleshooting
|
|
|
|
|
|
|
|
|
|
| Problem | Fix |
|
|
|
|
|
|---------|-----|
|
|
|
|
|
| CUDA OOM | Reduce `BATCH_SIZE` to 2, increase `GRAD_ACCUM` to 8, or reduce `MAX_LENGTH` to 2048 |
|
|
|
|
|
| Dataset download fails | Check internet; can pre-download and mount into `/data` |
|
2026-04-10 17:14:57 +00:00
|
|
|
| Docker can't see GPU | `sudo apt-get install -y nvidia-container-toolkit && sudo systemctl restart docker` |
|
|
|
|
|
| Training loss not decreasing | Check LR; verify labels aren't all -100; verify token IDs in data (Step 2) |
|
|
|
|
|
| Model still code-dumps after training | Verify embed_tokens in targets; try more epochs; try lora_r=32 |
|
|
|
|
|
| Model emits tokens but broken JSON | Need more diverse tool-call samples; increase max_length |
|
|
|
|
|
| Model emits tool tokens for everything | Overfit — add 30% non-tool instruction data to training mix |
|
|
|
|
|
| Disk full | Clean up `/srv/smollora/hf-cache` after model loads |
|
2026-04-10 05:28:30 +00:00
|
|
|
|
|
|
|
|
## Rollback
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
|
docker compose down
|
|
|
|
|
rm -rf /srv/smollora/output/*
|
|
|
|
|
# Fix whatever broke, then:
|
|
|
|
|
docker compose up -d
|
|
|
|
|
```
|