Files
model-tool-tests/NOTES.md

4.2 KiB

SmolLM3-3B Tool Call Fix — Notes

Problem

The SmolLM3-3B model's chat template has three bugs that break multi-turn tool calling in vLLM.

Bugs Found

Bug 1: Tool responses rendered as plain user messages

Location: chat_template.jinja, main loop, message.role == "tool" branch

Original:

{%- elif message.role == "tool" -%}
{{ "<|im_start|>" + "user\n"  + content + "<|im_end|>\n" }}

Tool responses show up as <|im_start|>user\n...<|im_end|> — the model cannot distinguish a tool result from a new user turn. When it sees weather data in a user message, it re-invokes the tool instead of answering.

Fix: Use the model's dedicated tool_response_start/tool_response_end tokens (128013/128014) to wrap tool responses so the model can distinguish them from user messages.

Bug 2: Assistant tool_calls not rendered in history

Location: chat_template.jinja, main loop, message.role == "assistant" branch

When the assistant message has tool_calls, the template only renders content (often empty/None) and drops the entire tool_calls array. The model never sees its own prior tool invocations.

Fix: Render tool calls using the model's native tool_call_start/tool_call_end tokens (128015/128016) with proper JSON format.

Bug 3: Thinking mode inverted

Location: chat_template.jinja, main loop and generation prompt

When reasoning_mode == "/think", the template does NOT wrap content in think tags. When reasoning_mode == "/no_think", it DOES wrap in ... tags. Completely backwards.

Fix: /think mode wraps content in ... tags. /no_think renders plain text.

Special Tokens

The model has these tool-related tokens in its tokenizer (added_tokens_decoder):

Token ID Text Purpose
128002 ... Think end
128013 ... Tool call start
128016 ... Tool call end

How the Fix Works

Template Changes

  1. Tool responses now render as:

    <|im_start|>user
    [tool_response_start]
    {tool result content}
    [tool_response_end]<|im_end|>
    

    Instead of a bare user message.

  2. Assistant tool calls now render as:

    <|im_start|>assistant
    {"name": "func_name", "arguments": {...}}
    [tool_call_end]<|im_end|>
    

    Instead of being dropped entirely.

  3. Thinking mode is now correctly mapped: /think → think tags, /no_think → plain text.

Key Technical Details

  • The template uses Jinja2's ~ operator instead of + for string concatenation. This avoids type errors when message.content is None (Jinja2's ~ coerces to string, + does not).
  • The tool_call_start/tool_call_end tokens are Unicode private-use-area characters that can't be typed in a text editor. The template must be generated programmatically using gen_template.py.
  • The tc.function.name and tc.function.arguments Jinja2 dot notation works correctly because Jinja2 resolves dict.key as dict["key"].
  • The {% generation %} tag is vLLM-specific and marks the assistant output region. It must be preserved.

Files

  • model-files/chat_template.jinja — The fixed template (generated, contains Unicode PUA characters)
  • model-files/gen_template.py — Script to regenerate the template inside the container where the tokenizer is available
  • model-files/hermes_tool_parser.py — vLLM Hermes tool parser (unchanged, works as-is for parsing ... format)

Deploying

  1. Run gen_template.py inside the vLLM container:

    docker cp model-files/gen_template.py smol-vllm-1:/tmp/
    docker exec smol-vllm-1 python3 /tmp/gen_template.py
    
  2. Copy the generated template to the mounted volume:

    docker cp smol-vllm-1:/root/chat_template.jinja /root/smol/chat_template.jinja
    
  3. Restart the container:

    cd /root/smol && docker compose restart
    

Remaining Issues

  • The model sometimes re-invokes tools in a loop instead of providing a final text answer. This is likely a training issue with the /no_think mode — the model outputs reasoning as content text but still generates tool calls.
  • The Hermes tool parser works for parsing ... blocks but the streaming parser may buffer long argument strings. This is a vLLM-level issue, not a template issue.