SmolLM3-3B LoRA — Tool Calling Fine-Tune

LoRA adapter training to teach SmolLM3-3B to emit native tool-call tokens.

Critical Training Objective

The base model does not emit structured tool-call tokens. When asked to use tools, it writes Python code that calls the tool as a function instead of emitting the startPos/endPos (token IDs 128002/128016) sequences that vLLM's Hermes parser expects. This was verified definitively using a raw token inspector (/home/openclaw/dev/chat-template-debugger/) that bypasses all middleware and calls llm.generate() directly.

The #1 priority for this LoRA run is to make the model emit tool-call tokens natively. Specifically:

When the user asks the model to use a tool, the model should emit startPos → JSON function call → endPos instead of writing from tools import X / X(args) as Python code
This must work for all tool patterns — not just structured JSON tools (save_config) but also code-generation tools (write_file) that the model currently code-dumps instead of calling
The model should still produce clean text content when NOT invoking a tool — we're adding a capability, not replacing one

Why this matters

The current "working" save_config path through the vLLM API is not actually the model doing tool calls — the Hermes parser is reconstructing tool calls from the model's text/code output. This is fragile and fails for longer outputs (write_file). Once the model emits native tool-call tokens, both paths work correctly and the parser doesn't need to do salvage work.

Quick Start

# Build
docker build -t smollora .

# Run full pipeline (prepare data + train)
docker run --gpus all \
  -v /path/on/host/output:/data/lora-output \
  smollora

# Skip data prep if you already have processed data
docker run --gpus all \
  -e SKIP_PREP=1 \
  -v /path/on/host/processed:/data/processed \
  -v /path/on/host/output:/data/lora-output \
  smollora

Environment Variables

Var	Default	Description
`MODEL`	`HuggingFaceTB/SmolLM3-3B`	Base model (HF repo or local path)
`DATA_DIR`	`/data/processed`	Processed data directory
`OUTPUT_DIR`	`/data/lora-output`	Training output directory
`EPOCHS`	`3`	Training epochs
`BATCH_SIZE`	`4`	Per-device batch size
`LR`	`2e-4`	Learning rate
`LORA_R`	`16`	LoRA rank
`MAX_LENGTH`	`4096`	Max sequence length
`SKIP_PREP`	`0`	Set to `1` to skip data preparation

Datasets

Three datasets combined and converted to SmolLM3's native token format:

interstellarninja/tool-calls-multiturn — Multi-turn tool calling conversations
NousResearch/Hermes-Function-Calling-V1 — Hermes-format function calling
Salesforce/xLAM-function-calling-60k — Large-scale function calling (60k samples)

Only conversations containing tool calls are kept. All are normalized to SmolLM3's special tokens:

Tool calls → startPos/endPos (token IDs 128002/128016)
Tool responses → eni/eni_result (token IDs 128013/128014)

LoRA Configuration

Rank: 16
Alpha: 32
Target modules: q/k/v/o projections + gate/up/down MLP
Dropout: 0.05
Scheduler: Cosine with 3% warmup
Optimizer: AdamW (fused)
Gradient checkpointing: Enabled

Output

The trained adapter is saved to $OUTPUT_DIR/final/. To use with vLLM:

# Merge adapter into base model (recommended for vLLM)
python -m peft import PeftModel
# Or pass the adapter path directly with --enable-lora

SSH Deployment

# On GPU box, after SSH-ing in:
docker run --gpus all -v ~/smol-data:/data smollora

# Or with local model cache:
docker run --gpus all \
  -v ~/.cache/huggingface:/root/.cache/huggingface \
  -v ~/smol-data:/data \
  smollora

3.8 KiB Raw Permalink Blame History