# SmolLM3-3B LoRA Training — Deployment Runbook ## Objective Train a LoRA adapter that teaches SmolLM3-3B to emit native tool-call tokens (IDs 128015/128016) instead of code-dumping. See `TRAINING_PLAN.md` for the full strategy. ## Prerequisites - [ ] GPU server accessible via SSH (`root@107.191.43.158`) - [ ] Docker + NVIDIA Container Toolkit installed - [ ] This repo cloned at `/root/smollora` on the GPU server ## Step 1: Sync the Code From the OpenClaw workspace, push any changes: ```bash cd /home/openclaw/dev/smollora git add -A && git commit -m "updates" && git push ``` On the GPU server, pull the latest: ```bash ssh root@107.191.43.158 cd /root/smollora && git pull ``` > **Rule:** Always mutate code on the OpenClaw side, push, then pull on the GPU server. > Never edit files directly on the server — changes won't propagate back. ## Step 2: Verify the Data Prep Will Produce Correct Tokens Before training, confirm the processed data will contain token IDs 128015/128016. After data prep runs (or as a dry run), check: ```python from transformers import AutoTokenizer import json tokenizer = AutoTokenizer.from_pretrained("HuggingFaceTB/SmolLM3-3B", trust_remote_code=True) with open("/data/processed/train.jsonl") as f: sample = json.loads(f.readline()) text = tokenizer.apply_chat_template(sample["messages"], tokenize=False) ids = tokenizer.encode(text) assert 128015 in ids, "Tool call start token (128015) missing — data prep is broken!" assert 128016 in ids, "Tool call end token (128016) missing — data prep is broken!" print(f"✓ Token IDs verified. Sample has {len(ids)} tokens, tool-call tokens present.") ``` If this fails, do NOT proceed to training. Fix `prepare_data.py` or the tokenizer first. ## Step 3: Build & Run ```bash cd /root/smollora docker compose build docker compose up -d ``` Verify it's running: ```bash docker compose ps docker compose logs --tail=20 ``` ## Step 4: Exec In & Run Training ```bash docker compose exec smollora bash ``` Inside the container: ```bash # Full pipeline (data prep + train) /app/run.sh # Or skip data prep if already done SKIP_PREP=1 /app/run.sh ``` ### Key Training Parameters for This Run These should be set in `run.sh` or passed as env vars: | Param | Value | Notes | |-------|-------|-------| | `MODEL` | `HuggingFaceTB/SmolLM3-3B` | Base model | | `EPOCHS` | `3` | Increase to 5 if val loss still dropping | | `LR` | `2e-4` | Drop to 1e-4 if loss spikes | | `LORA_R` | `16` | Bump to 32 if loss plateaus | | `BATCH_SIZE` | `4` | Reduce to 2 if OOM | | `MAX_LENGTH` | `4096` | Enough for tool calls + code | **Critical:** `embed_tokens` MUST be in the LoRA target modules. Verify in `train_lora.py` that `target_modules` includes `"embed_tokens"`. Without it, the adapter can't adjust the tool-call token embeddings and the model won't learn to emit them. ## Step 5: Monitor From the host: ```bash docker compose logs -f watch -n5 nvidia-smi ``` Watch for: - ✅ Training loss decreasing steadily - ✅ Val loss decreasing (not diverging from train loss) - ✅ No CUDA OOM - ❌ Val loss increasing while train loss decreases = overfitting → reduce epochs or add more data Expected timeline on a single A100: - Data prep: ~10-20 min - Training (3 epochs, ~15k samples): ~1-3 hours ## Step 6: Validate — Raw Token Emission Test **Do not deploy until this passes.** 1. Merge the LoRA adapter into the base model: ```python from peft import PeftModel from transformers import AutoModelForCausalLM, AutoTokenizer base = AutoModelForCausalLM.from_pretrained("HuggingFaceTB/SmolLM3-3B", trust_remote_code=True) tokenizer = AutoTokenizer.from_pretrained("HuggingFaceTB/SmolLM3-3B", trust_remote_code=True) model = PeftModel.from_pretrained(base, "/data/lora-output/final") merged = model.merge_and_unload() merged.save_pretrained("/data/merged-model") tokenizer.save_pretrained("/data/merged-model") ``` 2. Copy the merged model to the chat-template-debugger: ```bash cp -r /data/merged-model /root/chat-template-debugger/models/SmolLM3-3B-toolcall ``` 3. Run the raw token debugger (stage 1): ```bash docker exec -e MODEL_PATH=/workspace/models/SmolLM3-3B-toolcall \ -e PROMPT_FILE=/workspace/prompts/smol_write_file.txt \ ct-debug-run python3 /workspace/scripts/stage1_debug.py ``` 4. **Pass criteria:** - Token IDs **128015** and **128016** appear in the output - Valid JSON follows token 128015 - No Python code-dumping 5. Also test with `smol_save_config.txt` prompt — same criteria. If the model still code-dumps, the training didn't work. Check: - Were tokens 128015/128016 in the training data? (Step 2) - Is `embed_tokens` in the LoRA targets? - Was there enough data / enough epochs? ## Step 7: Deploy to vLLM Once validation passes: 1. Copy the merged model to the vLLM model directory 2. Update the vLLM docker-compose to point at the merged model 3. Restart vLLM 4. Run the streaming tool call tests from `/home/openclaw/dev/model-tool-tests` ## Troubleshooting | Problem | Fix | |---------|-----| | CUDA OOM | Reduce `BATCH_SIZE` to 2, increase `GRAD_ACCUM` to 8, or reduce `MAX_LENGTH` to 2048 | | Dataset download fails | Check internet; can pre-download and mount into `/data` | | Docker can't see GPU | `sudo apt-get install -y nvidia-container-toolkit && sudo systemctl restart docker` | | Training loss not decreasing | Check LR; verify labels aren't all -100; verify token IDs in data (Step 2) | | Model still code-dumps after training | Verify embed_tokens in targets; try more epochs; try lora_r=32 | | Model emits tokens but broken JSON | Need more diverse tool-call samples; increase max_length | | Model emits tool tokens for everything | Overfit — add 30% non-tool instruction data to training mix | | Disk full | Clean up `/srv/smollora/hf-cache` after model loads | ## Rollback ```bash docker compose down rm -rf /srv/smollora/output/* # Fix whatever broke, then: docker compose up -d ```