diff --git a/RUNBOOK.md b/RUNBOOK.md index 2d5b136..571164c 100644 --- a/RUNBOOK.md +++ b/RUNBOOK.md @@ -1,50 +1,62 @@ # SmolLM3-3B LoRA Training — Deployment Runbook +## Objective + +Train a LoRA adapter that teaches SmolLM3-3B to emit native tool-call tokens +(IDs 128015/128016) instead of code-dumping. See `TRAINING_PLAN.md` for the +full strategy. + ## Prerequisites -- [ ] GPU server deployed and accessible via SSH -- [ ] SSH creds from Mike (host, user, key/password) -- [ ] Docker + Docker Compose + NVIDIA Container Toolkit installed on GPU server -- [ ] This repo at `/home/openclaw/dev/smollora` +- [ ] GPU server accessible via SSH (`root@107.191.43.158`) +- [ ] Docker + NVIDIA Container Toolkit installed +- [ ] This repo cloned at `/root/smollora` on the GPU server -## Step 1: SSH In & Prep the Host +## Step 1: Sync the Code +From the OpenClaw workspace, push any changes: ```bash -ssh @ +cd /home/openclaw/dev/smollora +git add -A && git commit -m "updates" && git push ``` -Verify GPU is visible: +On the GPU server, pull the latest: ```bash -nvidia-smi -docker run --rm --gpus all nvidia/cuda:12.4.0-base-ubuntu22.04 nvidia-smi +ssh root@107.191.43.158 +cd /root/smollora && git pull ``` -If Docker/nvidia toolkit missing, install before continuing. +> **Rule:** Always mutate code on the OpenClaw side, push, then pull on the GPU server. +> Never edit files directly on the server — changes won't propagate back. -## Step 2: Create Persistent Directories +## Step 2: Verify the Data Prep Will Produce Correct Tokens -```bash -sudo mkdir -p /srv/smollora/{data,output,hf-cache} -sudo chown -R $(whoami):$(whoami) /srv/smollora +Before training, confirm the processed data will contain token IDs 128015/128016. +After data prep runs (or as a dry run), check: + +```python +from transformers import AutoTokenizer +import json + +tokenizer = AutoTokenizer.from_pretrained("HuggingFaceTB/SmolLM3-3B", trust_remote_code=True) + +with open("/data/processed/train.jsonl") as f: + sample = json.loads(f.readline()) + +text = tokenizer.apply_chat_template(sample["messages"], tokenize=False) +ids = tokenizer.encode(text) + +assert 128015 in ids, "Tool call start token (128015) missing — data prep is broken!" +assert 128016 in ids, "Tool call end token (128016) missing — data prep is broken!" +print(f"✓ Token IDs verified. Sample has {len(ids)} tokens, tool-call tokens present.") ``` -## Step 3: Copy Project Files to Server +If this fails, do NOT proceed to training. Fix `prepare_data.py` or the tokenizer first. -From the OpenClaw box: -```bash -scp -r /home/openclaw/dev/smollora @:/tmp/smollora -``` - -On the GPU server: -```bash -cp -r /tmp/smollora ~/smollora -cd ~/smollora -``` - -## Step 4: Build & Start Container +## Step 3: Build & Run ```bash -cd ~/smollora +cd /root/smollora docker compose build docker compose up -d ``` @@ -55,57 +67,107 @@ docker compose ps docker compose logs --tail=20 ``` -## Step 5: Exec In & Kick Off Training +## Step 4: Exec In & Run Training ```bash docker compose exec smollora bash ``` -Inside the container, run the pipeline: +Inside the container: + ```bash +# Full pipeline (data prep + train) /app/run.sh -``` -Watch for: -- ✅ Datasets downloading successfully -- ✅ Samples counted (should be thousands) -- ✅ Model loading without OOM -- ✅ First few training steps completing (check loss is decreasing) -- ✅ No CUDA OOM errors in first 50 steps - -If data prep already ran and you just want to re-train: -```bash +# Or skip data prep if already done SKIP_PREP=1 /app/run.sh ``` -## Step 6: Monitor +### Key Training Parameters for This Run -From the host (no need to stay in the container): +These should be set in `run.sh` or passed as env vars: + +| Param | Value | Notes | +|-------|-------|-------| +| `MODEL` | `HuggingFaceTB/SmolLM3-3B` | Base model | +| `EPOCHS` | `3` | Increase to 5 if val loss still dropping | +| `LR` | `2e-4` | Drop to 1e-4 if loss spikes | +| `LORA_R` | `16` | Bump to 32 if loss plateaus | +| `BATCH_SIZE` | `4` | Reduce to 2 if OOM | +| `MAX_LENGTH` | `4096` | Enough for tool calls + code | + +**Critical:** `embed_tokens` MUST be in the LoRA target modules. Verify in +`train_lora.py` that `target_modules` includes `"embed_tokens"`. Without it, +the adapter can't adjust the tool-call token embeddings and the model won't +learn to emit them. + +## Step 5: Monitor + +From the host: ```bash -# Follow logs docker compose logs -f - -# GPU utilization watch -n5 nvidia-smi ``` +Watch for: +- ✅ Training loss decreasing steadily +- ✅ Val loss decreasing (not diverging from train loss) +- ✅ No CUDA OOM +- ❌ Val loss increasing while train loss decreases = overfitting → reduce epochs or add more data + Expected timeline on a single A100: - Data prep: ~10-20 min -- Training (3 epochs, ~20-40k samples): ~2-4 hours -- Could be longer on smaller GPUs +- Training (3 epochs, ~15k samples): ~1-3 hours -## Step 7: Verify Output +## Step 6: Validate — Raw Token Emission Test -```bash -# Check the LoRA adapter was saved -ls -la /srv/smollora/output/final/ -# Should see: adapter_config.json, adapter_model.safetensors, tokenizer files -``` +**Do not deploy until this passes.** -## Step 8: Notify Mike +1. Merge the LoRA adapter into the base model: + ```python + from peft import PeftModel + from transformers import AutoModelForCausalLM, AutoTokenizer -Message Mike: -> 🎭 LoRA training is running on the GPU box. Data prep done, [N] samples, training started at [time]. Estimated completion: [est]. I'll check back periodically — will ping you if anything blows up or when it finishes. + base = AutoModelForCausalLM.from_pretrained("HuggingFaceTB/SmolLM3-3B", trust_remote_code=True) + tokenizer = AutoTokenizer.from_pretrained("HuggingFaceTB/SmolLM3-3B", trust_remote_code=True) + model = PeftModel.from_pretrained(base, "/data/lora-output/final") + merged = model.merge_and_unload() + merged.save_pretrained("/data/merged-model") + tokenizer.save_pretrained("/data/merged-model") + ``` + +2. Copy the merged model to the chat-template-debugger: + ```bash + cp -r /data/merged-model /root/chat-template-debugger/models/SmolLM3-3B-toolcall + ``` + +3. Run the raw token debugger (stage 1): + ```bash + docker exec -e MODEL_PATH=/workspace/models/SmolLM3-3B-toolcall \ + -e PROMPT_FILE=/workspace/prompts/smol_write_file.txt \ + ct-debug-run python3 /workspace/scripts/stage1_debug.py + ``` + +4. **Pass criteria:** + - Token IDs **128015** and **128016** appear in the output + - Valid JSON follows token 128015 + - No Python code-dumping + +5. Also test with `smol_save_config.txt` prompt — same criteria. + +If the model still code-dumps, the training didn't work. Check: +- Were tokens 128015/128016 in the training data? (Step 2) +- Is `embed_tokens` in the LoRA targets? +- Was there enough data / enough epochs? + +## Step 7: Deploy to vLLM + +Once validation passes: + +1. Copy the merged model to the vLLM model directory +2. Update the vLLM docker-compose to point at the merged model +3. Restart vLLM +4. Run the streaming tool call tests from `/home/openclaw/dev/model-tool-tests` ## Troubleshooting @@ -113,13 +175,15 @@ Message Mike: |---------|-----| | CUDA OOM | Reduce `BATCH_SIZE` to 2, increase `GRAD_ACCUM` to 8, or reduce `MAX_LENGTH` to 2048 | | Dataset download fails | Check internet; can pre-download and mount into `/data` | -| Docker can't see GPU | Install nvidia-container-toolkit: `sudo apt-get install -y nvidia-container-toolkit && sudo systemctl restart docker` | -| Training loss not decreasing | Check LR — try `1e-4` or `5e-5`; verify labels aren't all -100 | -| Disk full | Clean up `/srv/smollora/hf-cache` after model loads; processed data is small | +| Docker can't see GPU | `sudo apt-get install -y nvidia-container-toolkit && sudo systemctl restart docker` | +| Training loss not decreasing | Check LR; verify labels aren't all -100; verify token IDs in data (Step 2) | +| Model still code-dumps after training | Verify embed_tokens in targets; try more epochs; try lora_r=32 | +| Model emits tokens but broken JSON | Need more diverse tool-call samples; increase max_length | +| Model emits tool tokens for everything | Overfit — add 30% non-tool instruction data to training mix | +| Disk full | Clean up `/srv/smollora/hf-cache` after model loads | ## Rollback -If everything goes sideways: ```bash docker compose down rm -rf /srv/smollora/output/*