2026-04-10 13:55:43 +00:00
2026-04-10 13:55:43 +00:00
2026-04-10 13:55:43 +00:00
2026-04-10 13:55:43 +00:00
2026-04-10 13:55:43 +00:00
2026-04-10 13:55:43 +00:00
2026-04-10 13:55:43 +00:00
2026-04-10 13:55:43 +00:00

SmolLM3-3B Tool Call Fix — Notes

Status: SOLVED

All three template bugs fixed, reasoning parser working, tool calling functional.

What Was Fixed

Bug 1: Tool responses rendered as plain user messages

Tool responses showed up as <|im_start|>user\n... — model couldn't distinguish them from new user turns and kept re-calling tools. Fixed by wrapping tool responses with the model's dedicated tool_response_start/tool_response_end tokens (128013/128014).

Bug 2: Assistant tool_calls not rendered in history

When assistant message had tool_calls, the template only rendered content and dropped the tool call array. Model never saw its own prior invocations. Fixed by rendering tool calls using tool_call_start/tool_call_end tokens (128015/128016).

Bug 3: Thinking mode direction swapped

/think mode produced bare assistant prompt (no think tags), /no_think wrapped in think tags. Completely backwards. Fixed: /think opens ... tags, /no_think is plain text.

Special Tokens

Token ID Text Purpose
128002 ... Tool call start
128016 ... Tool call end

Patched Files (in model-files/)

chat_template.jinja — Fixed template

Three fixes applied:

  1. Tool responses wrapped in tool_response_start/tool_response_end tokens
  2. Assistant tool_calls rendered in tool_call_start/tool_call_end format
  3. Thinking mode direction corrected

Uses Jinja2 ~ operator (not +) to avoid type errors when message.content is None.

gen_template.py — Template generator

Regenerates chat_template.jinja inside the container where the tokenizer is available. Required because the special tokens are Unicode private-use-area characters that can't be typed in editors.

smol_tool_parser.py — Tool call parser is just the unchanged hermes_tool_parser.py in case we need to change it

The stock vLLM Hermes parser works as-is for parsing ... blocks. No patches needed.

Reasoning Parser — NOT PATCHED

The built-in deepseek_r1 reasoning parser in vLLM works with SmolLM3 out of the box — they share the same ... tokens. Verified by diffing the container's copy against the vllm source: identical, no patches needed.

Deploying

  1. Generate template inside the container:

    docker cp model-files/gen_template.py smol-vllm-1:/tmp/
    docker exec smol-vllm-1 python3 /tmp/gen_template.py
    
  2. Copy to mounted volume and restart:

    docker cp smol-vllm-1:/root/chat_template.jinja /root/smol/chat_template.jinja
    cd /root/smol && docker compose restart
    
  3. Required vLLM flags:

    --chat-template=/root/chat_template.jinja
    --enable-auto-tool-choice
    --tool-call-parser=hermes
    --reasoning-parser=deepseek_r1
    --chat-template-content-format=string
    

Test Results

  • Tool response tests: All PASS (streaming + non-streaming)
  • Streaming tool calls: Incremental, 325+ chunks
  • Reasoning parser: Correctly splits thinking/content
  • Multi-turn tool use: Model reads results, answers properly
  • ⚠️ 3B model doesn't reliably choose tools over free-text for complex tasks (writes code as content instead of calling write_file). This is a model capability gap, not a parsing issue. Planned LoRA to address.

Next Steps

  • LoRA training to make tool calling more reliable (especially forced tool use scenarios)
  • Candidate dataset: interstellarninja/tool-calls-multiturn
  • Also worth considering: NousResearch/Hermes-Function-Calling-V1, Salesforce/xLAM-function-calling-60k
Description
No description provided
Readme 110 MiB
Languages
Python 63.4%
Jinja 36.5%
Dockerfile 0.1%