# SmolLM3-3B LoRA — Tool Calling Fine-Tune LoRA adapter training to teach SmolLM3-3B to emit native tool-call tokens. ## Critical Training Objective The base model **does not emit structured tool-call tokens**. When asked to use tools, it writes Python code that *calls* the tool as a function instead of emitting the `startPos`/`endPos` (token IDs 128002/128016) sequences that vLLM's Hermes parser expects. This was verified definitively using a raw token inspector (`/home/openclaw/dev/chat-template-debugger/`) that bypasses all middleware and calls `llm.generate()` directly. **The #1 priority for this LoRA run** is to make the model emit tool-call tokens natively. Specifically: 1. When the user asks the model to use a tool, the model should emit `startPos` → JSON function call → `endPos` instead of writing `from tools import X` / `X(args)` as Python code 2. This must work for **all** tool patterns — not just structured JSON tools (save_config) but also code-generation tools (write_file) that the model currently code-dumps instead of calling 3. The model should still produce clean text content when NOT invoking a tool — we're adding a capability, not replacing one ### Why this matters The current "working" save_config path through the vLLM API is not actually the model doing tool calls — the Hermes parser is reconstructing tool calls from the model's text/code output. This is fragile and fails for longer outputs (write_file). Once the model emits native tool-call tokens, both paths work correctly and the parser doesn't need to do salvage work. ## Quick Start ```bash # Build docker build -t smollora . # Run full pipeline (prepare data + train) docker run --gpus all \ -v /path/on/host/output:/data/lora-output \ smollora # Skip data prep if you already have processed data docker run --gpus all \ -e SKIP_PREP=1 \ -v /path/on/host/processed:/data/processed \ -v /path/on/host/output:/data/lora-output \ smollora ``` ## Environment Variables | Var | Default | Description | |-----|---------|-------------| | `MODEL` | `HuggingFaceTB/SmolLM3-3B` | Base model (HF repo or local path) | | `DATA_DIR` | `/data/processed` | Processed data directory | | `OUTPUT_DIR` | `/data/lora-output` | Training output directory | | `EPOCHS` | `3` | Training epochs | | `BATCH_SIZE` | `4` | Per-device batch size | | `LR` | `2e-4` | Learning rate | | `LORA_R` | `16` | LoRA rank | | `MAX_LENGTH` | `4096` | Max sequence length | | `SKIP_PREP` | `0` | Set to `1` to skip data preparation | ## Datasets Three datasets combined and converted to SmolLM3's native token format: 1. **interstellarninja/tool-calls-multiturn** — Multi-turn tool calling conversations 2. **NousResearch/Hermes-Function-Calling-V1** — Hermes-format function calling 3. **Salesforce/xLAM-function-calling-60k** — Large-scale function calling (60k samples) Only conversations containing tool calls are kept. All are normalized to SmolLM3's special tokens: - Tool calls → `startPos`/`endPos` (token IDs 128002/128016) - Tool responses → `eni`/`eni_result` (token IDs 128013/128014) ## LoRA Configuration - **Rank:** 16 - **Alpha:** 32 - **Target modules:** q/k/v/o projections + gate/up/down MLP - **Dropout:** 0.05 - **Scheduler:** Cosine with 3% warmup - **Optimizer:** AdamW (fused) - **Gradient checkpointing:** Enabled ## Output The trained adapter is saved to `$OUTPUT_DIR/final/`. To use with vLLM: ```bash # Merge adapter into base model (recommended for vLLM) python -m peft import PeftModel # Or pass the adapter path directly with --enable-lora ``` ## SSH Deployment ```bash # On GPU box, after SSH-ing in: docker run --gpus all -v ~/smol-data:/data smollora # Or with local model cache: docker run --gpus all \ -v ~/.cache/huggingface:/root/.cache/huggingface \ -v ~/smol-data:/data \ smollora ```