biondizzle/model-tool-tests

Fork 0

Files

Jinx 1186c9d816 SmolLM3-3B tool call fix: template bugs found and patched

2026-04-10 03:33:16 +00:00

4.2 KiB

Raw Blame History

SmolLM3-3B Tool Call Fix — Notes

Problem

The SmolLM3-3B model's chat template has three bugs that break multi-turn tool calling in vLLM.

Bugs Found

Bug 1: Tool responses rendered as plain user messages

Location: chat_template.jinja, main loop, message.role == "tool" branch

Original:

{%- elif message.role == "tool" -%}
{{ "<|im_start|>" + "user\n"  + content + "<|im_end|>\n" }}

Tool responses show up as <|im_start|>user\n...<|im_end|> — the model cannot distinguish a tool result from a new user turn. When it sees weather data in a user message, it re-invokes the tool instead of answering.

Fix: Use the model's dedicated tool_response_start/tool_response_end tokens (128013/128014) to wrap tool responses so the model can distinguish them from user messages.

Bug 2: Assistant tool_calls not rendered in history

Location: chat_template.jinja, main loop, message.role == "assistant" branch

When the assistant message has tool_calls, the template only renders content (often empty/None) and drops the entire tool_calls array. The model never sees its own prior tool invocations.

Fix: Render tool calls using the model's native tool_call_start/tool_call_end tokens (128015/128016) with proper JSON format.

Bug 3: Thinking mode inverted

Location: chat_template.jinja, main loop and generation prompt

When reasoning_mode == "/think", the template does NOT wrap content in think tags. When reasoning_mode == "/no_think", it DOES wrap in ... tags. Completely backwards.

Fix: /think mode wraps content in ... tags. /no_think renders plain text.

Special Tokens

The model has these tool-related tokens in its tokenizer (added_tokens_decoder):

Token ID	Text	Purpose
128002	`...`	Think end
128013	`...`	Tool call start
128016	`...`	Tool call end

How the Fix Works

Template Changes

Tool responses now render as:

<|im_start|>user
[tool_response_start]
{tool result content}
[tool_response_end]<|im_end|>

Instead of a bare user message.

Assistant tool calls now render as:

<|im_start|>assistant
{"name": "func_name", "arguments": {...}}
[tool_call_end]<|im_end|>

Instead of being dropped entirely.

Thinking mode is now correctly mapped: /think → think tags, /no_think → plain text.

Key Technical Details

The template uses Jinja2's ~ operator instead of + for string concatenation. This avoids type errors when message.content is None (Jinja2's ~ coerces to string, + does not).
The tool_call_start/tool_call_end tokens are Unicode private-use-area characters that can't be typed in a text editor. The template must be generated programmatically using gen_template.py.
The tc.function.name and tc.function.arguments Jinja2 dot notation works correctly because Jinja2 resolves dict.key as dict["key"].
The {% generation %} tag is vLLM-specific and marks the assistant output region. It must be preserved.

Files

model-files/chat_template.jinja — The fixed template (generated, contains Unicode PUA characters)
model-files/gen_template.py — Script to regenerate the template inside the container where the tokenizer is available
model-files/hermes_tool_parser.py — vLLM Hermes tool parser (unchanged, works as-is for parsing ... format)

Deploying

Run gen_template.py inside the vLLM container:

docker cp model-files/gen_template.py smol-vllm-1:/tmp/
docker exec smol-vllm-1 python3 /tmp/gen_template.py

Copy the generated template to the mounted volume:

docker cp smol-vllm-1:/root/chat_template.jinja /root/smol/chat_template.jinja

Restart the container:

cd /root/smol && docker compose restart

Remaining Issues

The model sometimes re-invokes tools in a loop instead of providing a final text answer. This is likely a training issue with the /no_think mode — the model outputs reasoning as content text but still generates tool calls.
The Hermes tool parser works for parsing ... blocks but the streaming parser may buffer long argument strings. This is a vLLM-level issue, not a template issue.

4.2 KiB Raw Blame History