96 lines
3.8 KiB
Markdown
96 lines
3.8 KiB
Markdown
# SmolLM3-3B LoRA — Tool Calling Fine-Tune
|
|
|
|
LoRA adapter training to teach SmolLM3-3B to emit native tool-call tokens.
|
|
|
|
## Critical Training Objective
|
|
|
|
The base model **does not emit structured tool-call tokens**. When asked to use tools, it writes Python code that *calls* the tool as a function instead of emitting the `startPos`/`endPos` (token IDs 128002/128016) sequences that vLLM's Hermes parser expects. This was verified definitively using a raw token inspector (`/home/openclaw/dev/chat-template-debugger/`) that bypasses all middleware and calls `llm.generate()` directly.
|
|
|
|
**The #1 priority for this LoRA run** is to make the model emit tool-call tokens natively. Specifically:
|
|
|
|
1. When the user asks the model to use a tool, the model should emit `startPos` → JSON function call → `endPos` instead of writing `from tools import X` / `X(args)` as Python code
|
|
2. This must work for **all** tool patterns — not just structured JSON tools (save_config) but also code-generation tools (write_file) that the model currently code-dumps instead of calling
|
|
3. The model should still produce clean text content when NOT invoking a tool — we're adding a capability, not replacing one
|
|
|
|
### Why this matters
|
|
|
|
The current "working" save_config path through the vLLM API is not actually the model doing tool calls — the Hermes parser is reconstructing tool calls from the model's text/code output. This is fragile and fails for longer outputs (write_file). Once the model emits native tool-call tokens, both paths work correctly and the parser doesn't need to do salvage work.
|
|
|
|
## Quick Start
|
|
|
|
```bash
|
|
# Build
|
|
docker build -t smollora .
|
|
|
|
# Run full pipeline (prepare data + train)
|
|
docker run --gpus all \
|
|
-v /path/on/host/output:/data/lora-output \
|
|
smollora
|
|
|
|
# Skip data prep if you already have processed data
|
|
docker run --gpus all \
|
|
-e SKIP_PREP=1 \
|
|
-v /path/on/host/processed:/data/processed \
|
|
-v /path/on/host/output:/data/lora-output \
|
|
smollora
|
|
```
|
|
|
|
## Environment Variables
|
|
|
|
| Var | Default | Description |
|
|
|-----|---------|-------------|
|
|
| `MODEL` | `HuggingFaceTB/SmolLM3-3B` | Base model (HF repo or local path) |
|
|
| `DATA_DIR` | `/data/processed` | Processed data directory |
|
|
| `OUTPUT_DIR` | `/data/lora-output` | Training output directory |
|
|
| `EPOCHS` | `3` | Training epochs |
|
|
| `BATCH_SIZE` | `4` | Per-device batch size |
|
|
| `LR` | `2e-4` | Learning rate |
|
|
| `LORA_R` | `16` | LoRA rank |
|
|
| `MAX_LENGTH` | `4096` | Max sequence length |
|
|
| `SKIP_PREP` | `0` | Set to `1` to skip data preparation |
|
|
|
|
## Datasets
|
|
|
|
Three datasets combined and converted to SmolLM3's native token format:
|
|
|
|
1. **interstellarninja/tool-calls-multiturn** — Multi-turn tool calling conversations
|
|
2. **NousResearch/Hermes-Function-Calling-V1** — Hermes-format function calling
|
|
3. **Salesforce/xLAM-function-calling-60k** — Large-scale function calling (60k samples)
|
|
|
|
Only conversations containing tool calls are kept. All are normalized to SmolLM3's special tokens:
|
|
- Tool calls → `startPos`/`endPos` (token IDs 128002/128016)
|
|
- Tool responses → `eni`/`eni_result` (token IDs 128013/128014)
|
|
|
|
## LoRA Configuration
|
|
|
|
- **Rank:** 16
|
|
- **Alpha:** 32
|
|
- **Target modules:** q/k/v/o projections + gate/up/down MLP
|
|
- **Dropout:** 0.05
|
|
- **Scheduler:** Cosine with 3% warmup
|
|
- **Optimizer:** AdamW (fused)
|
|
- **Gradient checkpointing:** Enabled
|
|
|
|
## Output
|
|
|
|
The trained adapter is saved to `$OUTPUT_DIR/final/`. To use with vLLM:
|
|
|
|
```bash
|
|
# Merge adapter into base model (recommended for vLLM)
|
|
python -m peft import PeftModel
|
|
# Or pass the adapter path directly with --enable-lora
|
|
```
|
|
|
|
## SSH Deployment
|
|
|
|
```bash
|
|
# On GPU box, after SSH-ing in:
|
|
docker run --gpus all -v ~/smol-data:/data smollora
|
|
|
|
# Or with local model cache:
|
|
docker run --gpus all \
|
|
-v ~/.cache/huggingface:/root/.cache/huggingface \
|
|
-v ~/smol-data:/data \
|
|
smollora
|
|
```
|