Document model's inability to emit native tool-call tokens
This commit is contained in:
27
README.md
27
README.md
@@ -73,8 +73,27 @@ The built-in `deepseek_r1` reasoning parser in vLLM works with SmolLM3 out of th
|
||||
- ✅ Multi-turn tool use: Model reads results, answers properly
|
||||
- ⚠️ 3B model doesn't reliably choose tools over free-text for complex tasks (writes code as content instead of calling write_file). This is a model capability gap, not a parsing issue. Planned LoRA to address.
|
||||
|
||||
## Next Steps
|
||||
## Known Limitation: Model Doesn't Emit Native Tool-Call Tokens
|
||||
|
||||
- **LoRA training** to make tool calling more reliable (especially forced tool use scenarios)
|
||||
- Candidate dataset: `interstellarninja/tool-calls-multiturn`
|
||||
- Also worth considering: `NousResearch/Hermes-Function-Calling-V1`, `Salesforce/xLAM-function-calling-60k`
|
||||
**Verified via raw token inspection (chat-template-debugger):** SmolLM3-3B does **not** natively emit structured tool-call tokens for any tool-use prompt. When asked to use `write_file` or `save_config`, the model writes Python code that *calls* the tool as a function (`save_config(config)`) instead of emitting the `startPos`/`endPos` token sequences that vLLM's parser expects.
|
||||
|
||||
### What's happening under the hood
|
||||
|
||||
| Prompt | Raw `llm.generate()` | Via vLLM API (chat template) |
|
||||
|--------|---------------------|------------------------------|
|
||||
| write_file (short) | ❌ Code-dumps `def write_file(...)` in a loop | ❌ Fails — parser can't extract tool call from code |
|
||||
| save_config (nested JSON) | ❌ Writes `from tools import save_config` + prose | ✅ "Passes" — but the parser is reconstructing the call from text |
|
||||
| save_config (streaming) | ❌ Same as above | ✅ Streams correctly — parser extracts JSON from prose/code |
|
||||
|
||||
The save_config "pass" is **not** the model emitting tool-call tokens. The chat template + Hermes parser is doing salvage work — it sees the model describing the tool call in text/code and restructures it into the `tool_calls` field. This works for structured JSON output (save_config) but breaks for longer code output (write_file) because the parser can't reliably extract a clean function call from a full Python implementation.
|
||||
|
||||
### Root cause
|
||||
|
||||
The model was trained on code and general instruction following, not on tool-calling token sequences. It *understands* what tools are conceptually (it names them, describes them, writes code that calls them) but it was never trained to emit the `startPos`/`endPos` token delimiters that signal a real tool invocation to the parser.
|
||||
|
||||
### Planned fix
|
||||
|
||||
**LoRA fine-tuning** to teach the model to emit native tool-call tokens. The training data in `smollora` already converts all tool calls to the correct `startPos`/`endPos` format. Once the model learns these token sequences, it should emit them directly instead of falling back to code-dumping. This will fix both the write_file and save_config cases at the model level, eliminating the parser's salvage work.
|
||||
|
||||
See: `/home/openclaw/dev/smollora/README.md` for LoRA training details.
|
||||
See: `/home/openclaw/dev/chat-template-debugger/` for the raw token inspector that proved this.
|
||||
|
||||
Reference in New Issue
Block a user