# SmolLM3-3B LoRA — Tool Calling Fine-Tune LoRA adapter training to make SmolLM3-3B a tool-calling savant. ## Quick Start ```bash # Build docker build -t smollora . # Run full pipeline (prepare data + train) docker run --gpus all \ -v /path/on/host/output:/data/lora-output \ smollora # Skip data prep if you already have processed data docker run --gpus all \ -e SKIP_PREP=1 \ -v /path/on/host/processed:/data/processed \ -v /path/on/host/output:/data/lora-output \ smollora ``` ## Environment Variables | Var | Default | Description | |-----|---------|-------------| | `MODEL` | `HuggingFaceTB/SmolLM3-3B` | Base model (HF repo or local path) | | `DATA_DIR` | `/data/processed` | Processed data directory | | `OUTPUT_DIR` | `/data/lora-output` | Training output directory | | `EPOCHS` | `3` | Training epochs | | `BATCH_SIZE` | `4` | Per-device batch size | | `LR` | `2e-4` | Learning rate | | `LORA_R` | `16` | LoRA rank | | `MAX_LENGTH` | `4096` | Max sequence length | | `SKIP_PREP` | `0` | Set to `1` to skip data preparation | ## Datasets Three datasets combined and converted to SmolLM3's native token format: 1. **interstellarninja/tool-calls-multiturn** — Multi-turn tool calling conversations 2. **NousResearch/Hermes-Function-Calling-V1** — Hermes-format function calling 3. **Salesforce/xLAM-function-calling-60k** — Large-scale function calling (60k samples) Only conversations containing tool calls are kept. All are normalized to SmolLM3's special tokens: - Tool calls → `startPos`/`endPos` (token IDs 128002/128016) - Tool responses → `eni`/`eni_result` (token IDs 128013/128014) ## LoRA Configuration - **Rank:** 16 - **Alpha:** 32 - **Target modules:** q/k/v/o projections + gate/up/down MLP - **Dropout:** 0.05 - **Scheduler:** Cosine with 3% warmup - **Optimizer:** AdamW (fused) - **Gradient checkpointing:** Enabled ## Output The trained adapter is saved to `$OUTPUT_DIR/final/`. To use with vLLM: ```bash # Merge adapter into base model (recommended for vLLM) python -m peft import PeftModel # Or pass the adapter path directly with --enable-lora ``` ## SSH Deployment ```bash # On GPU box, after SSH-ing in: docker run --gpus all -v ~/smol-data:/data smollora # Or with local model cache: docker run --gpus all \ -v ~/.cache/huggingface:/root/.cache/huggingface \ -v ~/smol-data:/data \ smollora ```