README.md

# SmolLM3-3B LoRA — Tool Calling Fine-Tune

LoRA adapter training to make SmolLM3-3B a tool-calling savant.

## Quick Start

```bash
# Build
docker build -t smollora .

# Run full pipeline (prepare data + train)
docker run --gpus all \
  -v /path/on/host/output:/data/lora-output \
  smollora

# Skip data prep if you already have processed data
docker run --gpus all \
  -e SKIP_PREP=1 \
  -v /path/on/host/processed:/data/processed \
  -v /path/on/host/output:/data/lora-output \
  smollora
```

## Environment Variables

| Var | Default | Description |
|-----|---------|-------------|
| `MODEL` | `HuggingFaceTB/SmolLM3-3B` | Base model (HF repo or local path) |
| `DATA_DIR` | `/data/processed` | Processed data directory |
| `OUTPUT_DIR` | `/data/lora-output` | Training output directory |
| `EPOCHS` | `3` | Training epochs |
| `BATCH_SIZE` | `4` | Per-device batch size |
| `LR` | `2e-4` | Learning rate |
| `LORA_R` | `16` | LoRA rank |
| `MAX_LENGTH` | `4096` | Max sequence length |
| `SKIP_PREP` | `0` | Set to `1` to skip data preparation |

## Datasets

Three datasets combined and converted to SmolLM3's native token format:

1. **interstellarninja/tool-calls-multiturn** — Multi-turn tool calling conversations
2. **NousResearch/Hermes-Function-Calling-V1** — Hermes-format function calling
3. **Salesforce/xLAM-function-calling-60k** — Large-scale function calling (60k samples)

Only conversations containing tool calls are kept. All are normalized to SmolLM3's special tokens:
- Tool calls → `startPos`/`endPos` (token IDs 128002/128016)
- Tool responses → `eni`/`eni_result` (token IDs 128013/128014)

## LoRA Configuration

- **Rank:** 16
- **Alpha:** 32
- **Target modules:** q/k/v/o projections + gate/up/down MLP
- **Dropout:** 0.05
- **Scheduler:** Cosine with 3% warmup
- **Optimizer:** AdamW (fused)
- **Gradient checkpointing:** Enabled

## Output

The trained adapter is saved to `$OUTPUT_DIR/final/`. To use with vLLM:

```bash
# Merge adapter into base model (recommended for vLLM)
python -m peft import PeftModel
# Or pass the adapter path directly with --enable-lora
```

## SSH Deployment

```bash
# On GPU box, after SSH-ing in:
docker run --gpus all -v ~/smol-data:/data smollora

# Or with local model cache:
docker run --gpus all \
  -v ~/.cache/huggingface:/root/.cache/huggingface \
  -v ~/smol-data:/data \
  smollora
```
Initial LoRA training setup for SmolLM3-3B tool calling 2026-04-10 05:11:05 +00:00			`# SmolLM3-3B LoRA — Tool Calling Fine-Tune`

			`LoRA adapter training to make SmolLM3-3B a tool-calling savant.`

			`## Quick Start`

			```bash
			`# Build`
			`docker build -t smollora .`

			`# Run full pipeline (prepare data + train)`
			`docker run --gpus all \`
			`-v /path/on/host/output:/data/lora-output \`
			`smollora`

			`# Skip data prep if you already have processed data`
			`docker run --gpus all \`
			`-e SKIP_PREP=1 \`
			`-v /path/on/host/processed:/data/processed \`
			`-v /path/on/host/output:/data/lora-output \`
			`smollora`
			```

			`## Environment Variables`

			`\| Var \| Default \| Description \|`
			`\|-----\|---------\|-------------\|`
			\| `MODEL` \| `HuggingFaceTB/SmolLM3-3B` \| Base model (HF repo or local path) \|
			\| `DATA_DIR` \| `/data/processed` \| Processed data directory \|
			\| `OUTPUT_DIR` \| `/data/lora-output` \| Training output directory \|
			\| `EPOCHS` \| `3` \| Training epochs \|
			\| `BATCH_SIZE` \| `4` \| Per-device batch size \|
			\| `LR` \| `2e-4` \| Learning rate \|
			\| `LORA_R` \| `16` \| LoRA rank \|
			\| `MAX_LENGTH` \| `4096` \| Max sequence length \|
			\| `SKIP_PREP` \| `0` \| Set to `1` to skip data preparation \|

			`## Datasets`

			`Three datasets combined and converted to SmolLM3's native token format:`

			`1. interstellarninja/tool-calls-multiturn — Multi-turn tool calling conversations`
			`2. NousResearch/Hermes-Function-Calling-V1 — Hermes-format function calling`
			`3. Salesforce/xLAM-function-calling-60k — Large-scale function calling (60k samples)`

			`Only conversations containing tool calls are kept. All are normalized to SmolLM3's special tokens:`
			- Tool calls → `startPos`/`endPos` (token IDs 128002/128016)
			- Tool responses → `eni`/`eni_result` (token IDs 128013/128014)

			`## LoRA Configuration`

			`- Rank: 16`
			`- Alpha: 32`
			`- Target modules: q/k/v/o projections + gate/up/down MLP`
			`- Dropout: 0.05`
			`- Scheduler: Cosine with 3% warmup`
			`- Optimizer: AdamW (fused)`
			`- Gradient checkpointing: Enabled`

			`## Output`

			The trained adapter is saved to `$OUTPUT_DIR/final/`. To use with vLLM:

			```bash
			`# Merge adapter into base model (recommended for vLLM)`
			`python -m peft import PeftModel`
			`# Or pass the adapter path directly with --enable-lora`
			```

			`## SSH Deployment`

			```bash
			`# On GPU box, after SSH-ing in:`
			`docker run --gpus all -v ~/smol-data:/data smollora`

			`# Or with local model cache:`
			`docker run --gpus all \`
			`-v ~/.cache/huggingface:/root/.cache/huggingface \`
			`-v ~/smol-data:/data \`
			`smollora`
			```