Add critical training objective: teach model to emit native tool-call tokens
This commit is contained in:
16
README.md
16
README.md
@@ -1,6 +1,20 @@
|
||||
# SmolLM3-3B LoRA — Tool Calling Fine-Tune
|
||||
|
||||
LoRA adapter training to make SmolLM3-3B a tool-calling savant.
|
||||
LoRA adapter training to teach SmolLM3-3B to emit native tool-call tokens.
|
||||
|
||||
## Critical Training Objective
|
||||
|
||||
The base model **does not emit structured tool-call tokens**. When asked to use tools, it writes Python code that *calls* the tool as a function instead of emitting the `startPos`/`endPos` (token IDs 128002/128016) sequences that vLLM's Hermes parser expects. This was verified definitively using a raw token inspector (`/home/openclaw/dev/chat-template-debugger/`) that bypasses all middleware and calls `llm.generate()` directly.
|
||||
|
||||
**The #1 priority for this LoRA run** is to make the model emit tool-call tokens natively. Specifically:
|
||||
|
||||
1. When the user asks the model to use a tool, the model should emit `startPos` → JSON function call → `endPos` instead of writing `from tools import X` / `X(args)` as Python code
|
||||
2. This must work for **all** tool patterns — not just structured JSON tools (save_config) but also code-generation tools (write_file) that the model currently code-dumps instead of calling
|
||||
3. The model should still produce clean text content when NOT invoking a tool — we're adding a capability, not replacing one
|
||||
|
||||
### Why this matters
|
||||
|
||||
The current "working" save_config path through the vLLM API is not actually the model doing tool calls — the Hermes parser is reconstructing tool calls from the model's text/code output. This is fragile and fails for longer outputs (write_file). Once the model emits native tool-call tokens, both paths work correctly and the parser doesn't need to do salvage work.
|
||||
|
||||
## Quick Start
|
||||
|
||||
|
||||
Reference in New Issue
Block a user