init commit

This commit is contained in:
2026-04-10 13:55:43 +00:00
commit 5029ab3b40
12 changed files with 1251979 additions and 0 deletions

2
Dockerfile Normal file
View File

@@ -0,0 +1,2 @@
FROM vllm/vllm-openai:v0.19.0

80
README.md Normal file
View File

@@ -0,0 +1,80 @@
# SmolLM3-3B Tool Call Fix — Notes
## Status: SOLVED ✅
All three template bugs fixed, reasoning parser working, tool calling functional.
## What Was Fixed
### Bug 1: Tool responses rendered as plain user messages
Tool responses showed up as `<|im_start|>user\n...` — model couldn't distinguish them from new user turns and kept re-calling tools. Fixed by wrapping tool responses with the model's dedicated `tool_response_start`/`tool_response_end` tokens (128013/128014).
### Bug 2: Assistant tool_calls not rendered in history
When assistant message had `tool_calls`, the template only rendered `content` and dropped the tool call array. Model never saw its own prior invocations. Fixed by rendering tool calls using `tool_call_start`/`tool_call_end` tokens (128015/128016).
### Bug 3: Thinking mode direction swapped
`/think` mode produced bare assistant prompt (no think tags), `/no_think` wrapped in think tags. Completely backwards. Fixed: `/think` opens `...` tags, `/no_think` is plain text.
## Special Tokens
| Token ID | Text | Purpose |
|----------|------|---------|
| 128002 | `...` | Tool call start |
| 128016 | `...` | Tool call end |
## Patched Files (in model-files/)
### `chat_template.jinja` — Fixed template
Three fixes applied:
1. Tool responses wrapped in `tool_response_start`/`tool_response_end` tokens
2. Assistant tool_calls rendered in `tool_call_start`/`tool_call_end` format
3. Thinking mode direction corrected
Uses Jinja2 `~` operator (not `+`) to avoid type errors when `message.content` is None.
### `gen_template.py` — Template generator
Regenerates `chat_template.jinja` inside the container where the tokenizer is available. Required because the special tokens are Unicode private-use-area characters that can't be typed in editors.
### `smol_tool_parser.py` — Tool call parser is just the unchanged hermes_tool_parser.py in case we need to change it
The stock vLLM Hermes parser works as-is for parsing `...` blocks. No patches needed.
## Reasoning Parser — NOT PATCHED
The built-in `deepseek_r1` reasoning parser in vLLM works with SmolLM3 out of the box — they share the same `...` tokens. Verified by diffing the container's copy against the vllm source: identical, no patches needed.
## Deploying
1. Generate template inside the container:
```bash
docker cp model-files/gen_template.py smol-vllm-1:/tmp/
docker exec smol-vllm-1 python3 /tmp/gen_template.py
```
2. Copy to mounted volume and restart:
```bash
docker cp smol-vllm-1:/root/chat_template.jinja /root/smol/chat_template.jinja
cd /root/smol && docker compose restart
```
3. Required vLLM flags:
```
--chat-template=/root/chat_template.jinja
--enable-auto-tool-choice
--tool-call-parser=hermes
--reasoning-parser=deepseek_r1
--chat-template-content-format=string
```
## Test Results
- ✅ Tool response tests: All PASS (streaming + non-streaming)
- ✅ Streaming tool calls: Incremental, 325+ chunks
- ✅ Reasoning parser: Correctly splits thinking/content
- ✅ Multi-turn tool use: Model reads results, answers properly
- ⚠️ 3B model doesn't reliably choose tools over free-text for complex tasks (writes code as content instead of calling write_file). This is a model capability gap, not a parsing issue. Planned LoRA to address.
## Next Steps
- **LoRA training** to make tool calling more reliable (especially forced tool use scenarios)
- Candidate dataset: `interstellarninja/tool-calls-multiturn`
- Also worth considering: `NousResearch/Hermes-Function-Calling-V1`, `Salesforce/xLAM-function-calling-60k`

102
chat_template.jinja Normal file
View File

@@ -0,0 +1,102 @@
{# ───── defaults ───── #}
{%- if enable_thinking is not defined -%}
{%- set enable_thinking = true -%}
{%- endif -%}
{# ───── reasoning mode ───── #}
{%- if enable_thinking -%}
{%- set reasoning_mode = "/think" -%}
{%- else -%}
{%- set reasoning_mode = "/no_think" -%}
{%- endif -%}
{# ───── header (system message) ───── #}
{{- "<|im_start|>system\n" -}}
{%- if messages[0].role == "system" -%}
{%- set system_message = messages[0].content -%}
{%- if "/no_think" in system_message -%}
{%- set reasoning_mode = "/no_think" -%}
{%- elif "/think" in system_message -%}
{%- set reasoning_mode = "/think" -%}
{%- endif -%}
{%- set custom_instructions = system_message.replace("/no_think", "").replace("/think", "").rstrip() -%}
{%- endif -%}
{%- if "/system_override" in system_message -%}
{{- custom_instructions.replace("/system_override", "").rstrip() -}}
{%- else -%}
{{- "## Metadata\n\n" -}}
{{- "Knowledge Cutoff Date: June 2025\n" -}}
{%- set today = strftime_now("%d %B %Y") -%}
{{- "Today Date: " ~ today ~ "\n" -}}
{{- "Reasoning Mode: " + reasoning_mode + "\n\n" -}}
{{- "## Custom Instructions\n\n" -}}
{%- if custom_instructions -%}
{{- custom_instructions + "\n\n" -}}
{%- elif reasoning_mode == "/think" -%}
{{- "You are a helpful AI assistant named SmolLM, trained by Hugging Face.\n\n" -}}
{%- else -%}
{{- "You are a helpful AI assistant named SmolLM, trained by Hugging Face.\n\n" -}}
{%- endif -%}
{%- if xml_tools or python_tools or tools -%}
{{- "### Tools\n\n" -}}
{%- if xml_tools or tools -%}
{%- if tools -%}
{%- set xml_tools = tools -%}
{%- endif -%}
{%- set ns = namespace(xml_tool_string="You may call one or more functions to assist with the user query.\nYou are provided with function signatures within <tools></tools> XML tags:\n\n<tools>\n") -%}
{%- for tool in xml_tools[:] -%}
{%- set ns.xml_tool_string = ns.xml_tool_string ~ (tool | tojson) ~ "\n" -%}
{%- endfor -%}
{%- set xml_tool_string = ns.xml_tool_string + "</tools>\n\nFor each function call, return a json object with function name and arguments within <tool_call> XML tags:\n<tool_call>\n{\"name\": <function-name>, \"arguments\": <args-json-object>}\n</tool_call>" -%}
{{- xml_tool_string -}}
{%- endif -%}
{%- if python_tools -%}
{%- set ns = namespace(python_tool_string="You may call one or more functions as python tools.\n<tools>\n") -%}
{%- for tool in python_tools[:] -%}
{%- set ns.python_tool_string = ns.python_tool_string ~ (tool | string) ~ "\n" -%}
{%- endfor -%}
{%- set python_tool_string = ns.python_tool_string + "</tools>\n\nThe state persists between code executions." -%}
{{- python_tool_string -}}
{%- endif -%}
{{- "\n\n" -}}
{%- endif -%}
{%- endif -%}
{{- "<|im_end|>\n" -}}
{# ───── main loop ───── #}
{%- for message in messages -%}
{%- if message.role == "user" -%}
{{ "<|im_start|>user\n" + message.content + "<|im_end|>\n" }}
{%- elif message.role == "assistant" -%}
{% generation %}
{%- if message.tool_calls -%}
{%- set ns = namespace(tc_text="") -%}
{%- for tc in message.tool_calls -%}
{%- set ns.tc_text = ns.tc_text ~ "<tool_call>\n{\"name\": \"" ~ tc.function.name ~ "\", \"arguments\": " ~ tc.function.arguments ~ "}\n</tool_call>" -%}
{%- endfor -%}
{{ "<|im_start|>assistant\n" ~ (message.content if message.content is string else "") ~ ns.tc_text ~ "<|im_end|>\n" }}
{%- else -%}
{%- if reasoning_mode == "/think" -%}
{{ "<|im_start|>assistant\n<think>\n" ~ (message.content if message.content is string else "") ~ "\n</think><|im_end|>\n" }}
{%- else -%}
{{ "<|im_start|>assistant\n" ~ (message.content if message.content is string else "") ~ "<|im_end|>\n" }}
{%- endif -%}
{%- endif -%}
{% endgeneration %}
{%- elif message.role == "tool" -%}
{{ "<|im_start|>user\n<tool_response>\n" ~ (message.content if message.content is string else "") ~ "\n</tool_response><|im_end|>\n" }}
{%- endif -%}
{%- endfor -%}
{# ───── generation prompt ───── #}
{%- if add_generation_prompt -%}
{%- if reasoning_mode == "/think" -%}
{{ "<|im_start|>assistant\n<think>\n" }}
{%- else -%}
{{ "<|im_start|>assistant\n" }}
{%- endif -%}
{%- endif -%}

35
docker-compose.yaml Normal file
View File

@@ -0,0 +1,35 @@
services:
vllm:
image: vllm/vllm-openai:v0.19.0
pull_policy: always
privileged: true
environment:
- HF_TOKEN=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
command:
- HuggingFaceTB/SmolLM3-3B
- --host=0.0.0.0
- --port=80
- --chat-template-content-format=string
- --chat-template=/root/chat_template.jinja
- --enable-auto-tool-choice
- --tool-call-parser=hermes
- --reasoning-parser=deepseek_r1
#- --max-model-len=131072
#- --hf-overrides={"rope_scaling":{"type":"yarn","factor":2.0,"original_max_position_embeddings":65536}}
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]
ipc: host
security_opt:
- seccomp:unconfined
tty: true
stdin_open: true
volumes:
- /srv:/root/.cache/huggingface
- ./chat_template.jinja:/root/chat_template.jinja
- ./hermes_tool_parser.py:/usr/local/lib/python3.12/dist-packages/vllm/tool_parsers/hermes_tool_parser.py
network_mode: host

130
gen_template.py Normal file
View File

@@ -0,0 +1,130 @@
#!/usr/bin/env python3
"""Generate the PRODUCTION fixed chat_template.jinja for SmolLM3-3B.
v2: Fixed thinking mode direction - /think now opens unga... tags
in the generation prompt so the model actually generates reasoning.
"""
from transformers import AutoTokenizer
tok = AutoTokenizer.from_pretrained("HuggingFaceTB/SmolLM3-3B")
THINK_S = tok.decode([128002])
THINK_E = tok.decode([128003])
RESP_S = tok.decode([128013])
RESP_E = tok.decode([128014])
TC_S = tok.decode([128015])
TC_E = tok.decode([128016])
T = []
# ─── defaults & system header ───
T.append(r"""{# ───── defaults ───── #}
{%- if enable_thinking is not defined -%}
{%- set enable_thinking = true -%}
{%- endif -%}
{# ───── reasoning mode ───── #}
{%- if enable_thinking -%}
{%- set reasoning_mode = "/think" -%}
{%- else -%}
{%- set reasoning_mode = "/no_think" -%}
{%- endif -%}
{# ───── header (system message) ───── #}
{{- "<|im_start|>system\n" -}}
{%- if messages[0].role == "system" -%}
{%- set system_message = messages[0].content -%}
{%- if "/no_think" in system_message -%}
{%- set reasoning_mode = "/no_think" -%}
{%- elif "/think" in system_message -%}
{%- set reasoning_mode = "/think" -%}
{%- endif -%}
{%- set custom_instructions = system_message.replace("/no_think", "").replace("/think", "").rstrip() -%}
{%- endif -%}
{%- if "/system_override" in system_message -%}
{{- custom_instructions.replace("/system_override", "").rstrip() -}}
{%- else -%}
{{- "## Metadata\n\n" -}}
{{- "Knowledge Cutoff Date: June 2025\n" -}}
{%- set today = strftime_now("%d %B %Y") -%}
{{- "Today Date: " ~ today ~ "\n" -}}
{{- "Reasoning Mode: " + reasoning_mode + "\n\n" -}}
{{- "## Custom Instructions\n\n" -}}
{%- if custom_instructions -%}
{{- custom_instructions + "\n\n" -}}
{%- elif reasoning_mode == "/think" -%}
{{- "You are a helpful AI assistant named SmolLM, trained by Hugging Face.\n\n" -}}
{%- else -%}
{{- "You are a helpful AI assistant named SmolLM, trained by Hugging Face.\n\n" -}}
{%- endif -%}
{%- if xml_tools or python_tools or tools -%}
{{- "### Tools\n\n" -}}
{%- if xml_tools or tools -%}
{%- if tools -%}
{%- set xml_tools = tools -%}
{%- endif -%}
{%- set ns = namespace(xml_tool_string="You may call one or more functions to assist with the user query.\nYou are provided with function signatures within <tools></tools> XML tags:\n\n<tools>\n") -%}
{%- for tool in xml_tools[:] -%}
{%- set ns.xml_tool_string = ns.xml_tool_string ~ (tool | tojson) ~ "\n" -%}
{%- endfor -%}""")
# Tool calling format with special tokens
T.append('\n {%- set xml_tool_string = ns.xml_tool_string + "</tools>\\n\\nFor each function call, return a json object with function name and arguments within ' + TC_S + ' XML tags:\\n' + TC_S + '\\n{\\"name\\": <function-name>, \\"arguments\\": <args-json-object>}\\n' + TC_E + '" -%}\n')
T.append(r""" {{- xml_tool_string -}}
{%- endif -%}
{%- if python_tools -%}
{%- set ns = namespace(python_tool_string="You may call one or more functions as python tools.\n<tools>\n") -%}
{%- for tool in python_tools[:] -%}
{%- set ns.python_tool_string = ns.python_tool_string ~ (tool | string) ~ "\n" -%}
{%- endfor -%}
{%- set python_tool_string = ns.python_tool_string + "</tools>\n\nThe state persists between code executions." -%}
{{- python_tool_string -}}
{%- endif -%}
{{- "\n\n" -}}
{%- endif -%}
{%- endif -%}
{{- "<|im_end|>\n" -}}""")
# ─── Main loop ───
T.append(r"""
{# ───── main loop ───── #}
{%- for message in messages -%}
{%- if message.role == "user" -%}
{{ "<|im_start|>user\n" + message.content + "<|im_end|>\n" }}
{%- elif message.role == "assistant" -%}
{% generation %}
{%- if message.tool_calls -%}""")
# FIX: Render tool calls with TC_S/TC_E tokens
T.append('\n {%- set ns = namespace(tc_text="") -%}\n {%- for tc in message.tool_calls -%}\n {%- set ns.tc_text = ns.tc_text ~ "' + TC_S + '\\n{\\"name\\": \\"" ~ tc.function.name ~ "\\", \\"arguments\\": " ~ tc.function.arguments ~ "}\\n' + TC_E + '" -%}\n {%- endfor -%}\n {{ "<|im_start|>assistant\\n" ~ (message.content if message.content is string else "") ~ ns.tc_text ~ "<|im_end|>\\n" }}\n')
T.append(r""" {%- else -%}""")
# FIX v2: /think = think tags, /no_think = plain text (CORRECT direction now)
T.append('\n {%- if reasoning_mode == "/think" -%}\n {{ "<|im_start|>assistant\\n' + THINK_S + '\\n" ~ (message.content if message.content is string else "") ~ "\\n' + THINK_E + '<|im_end|>\\n" }}\n {%- else -%}\n {{ "<|im_start|>assistant\\n" ~ (message.content if message.content is string else "") ~ "<|im_end|>\\n" }}\n {%- endif -%}\n')
T.append(r""" {%- endif -%}
{% endgeneration %}""")
# FIX: Tool role with RESP_S/RESP_E tokens
T.append('\n {%- elif message.role == "tool" -%}\n {{ "<|im_start|>user\\n' + RESP_S + '\\n" ~ (message.content if message.content is string else "") ~ "\\n' + RESP_E + '<|im_end|>\\n" }}\n')
T.append(r""" {%- endif -%}
{%- endfor -%}""")
# ─── Generation prompt ───
# FIX v2: /think opens unga... so model generates reasoning, /no_think is bare
T.append('\n\n{# ───── generation prompt ───── #}\n{%- if add_generation_prompt -%}\n {%- if reasoning_mode == "/think" -%}\n {{ "<|im_start|>assistant\\n' + THINK_S + '\\n" }}\n {%- else -%}\n {{ "<|im_start|>assistant\\n" }}\n {%- endif -%}\n{%- endif -%}\n')
template = ''.join(T)
with open('/root/chat_template.jinja', 'w', encoding='utf-8') as f:
f.write(template)
print("Production template v2 written to /root/chat_template.jinja")
print(f"Length: {len(template)} bytes")

View File

@@ -0,0 +1,207 @@
---
base_model: HuggingFaceTB/SmolLM3-3B
library_name: peft
pipeline_tag: text-generation
tags:
- base_model:adapter:HuggingFaceTB/SmolLM3-3B
- lora
- transformers
---
# Model Card for Model ID
<!-- Provide a quick summary of what the model is/does. -->
## Model Details
### Model Description
<!-- Provide a longer summary of what this model is. -->
- **Developed by:** [More Information Needed]
- **Funded by [optional]:** [More Information Needed]
- **Shared by [optional]:** [More Information Needed]
- **Model type:** [More Information Needed]
- **Language(s) (NLP):** [More Information Needed]
- **License:** [More Information Needed]
- **Finetuned from model [optional]:** [More Information Needed]
### Model Sources [optional]
<!-- Provide the basic links for the model. -->
- **Repository:** [More Information Needed]
- **Paper [optional]:** [More Information Needed]
- **Demo [optional]:** [More Information Needed]
## Uses
<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
### Direct Use
<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
[More Information Needed]
### Downstream Use [optional]
<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
[More Information Needed]
### Out-of-Scope Use
<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
[More Information Needed]
## Bias, Risks, and Limitations
<!-- This section is meant to convey both technical and sociotechnical limitations. -->
[More Information Needed]
### Recommendations
<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
## How to Get Started with the Model
Use the code below to get started with the model.
[More Information Needed]
## Training Details
### Training Data
<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
[More Information Needed]
### Training Procedure
<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
#### Preprocessing [optional]
[More Information Needed]
#### Training Hyperparameters
- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
#### Speeds, Sizes, Times [optional]
<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
[More Information Needed]
## Evaluation
<!-- This section describes the evaluation protocols and provides the results. -->
### Testing Data, Factors & Metrics
#### Testing Data
<!-- This should link to a Dataset Card if possible. -->
[More Information Needed]
#### Factors
<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
[More Information Needed]
#### Metrics
<!-- These are the evaluation metrics being used, ideally with a description of why. -->
[More Information Needed]
### Results
[More Information Needed]
#### Summary
## Model Examination [optional]
<!-- Relevant interpretability work for the model goes here -->
[More Information Needed]
## Environmental Impact
<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
- **Hardware Type:** [More Information Needed]
- **Hours used:** [More Information Needed]
- **Cloud Provider:** [More Information Needed]
- **Compute Region:** [More Information Needed]
- **Carbon Emitted:** [More Information Needed]
## Technical Specifications [optional]
### Model Architecture and Objective
[More Information Needed]
### Compute Infrastructure
[More Information Needed]
#### Hardware
[More Information Needed]
#### Software
[More Information Needed]
## Citation [optional]
<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
**BibTeX:**
[More Information Needed]
**APA:**
[More Information Needed]
## Glossary [optional]
<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
[More Information Needed]
## More Information [optional]
[More Information Needed]
## Model Card Authors [optional]
[More Information Needed]
## Model Card Contact
[More Information Needed]
### Framework versions
- PEFT 0.18.1

View File

@@ -0,0 +1,46 @@
{
"alora_invocation_tokens": null,
"alpha_pattern": {},
"arrow_config": null,
"auto_mapping": null,
"base_model_name_or_path": "HuggingFaceTB/SmolLM3-3B",
"bias": "none",
"corda_config": null,
"ensure_weight_tying": false,
"eva_config": null,
"exclude_modules": null,
"fan_in_fan_out": false,
"inference_mode": true,
"init_lora_weights": true,
"layer_replication": null,
"layers_pattern": null,
"layers_to_transform": null,
"loftq_config": {},
"lora_alpha": 32,
"lora_bias": false,
"lora_dropout": 0.05,
"megatron_config": null,
"megatron_core": "megatron.core",
"modules_to_save": null,
"peft_type": "LORA",
"peft_version": "0.18.1",
"qalora_group_size": 16,
"r": 16,
"rank_pattern": {},
"revision": null,
"target_modules": [
"up_proj",
"gate_proj",
"v_proj",
"k_proj",
"o_proj",
"down_proj",
"q_proj"
],
"target_parameters": null,
"task_type": "CAUSAL_LM",
"trainable_token_indices": null,
"use_dora": false,
"use_qalora": false,
"use_rslora": false
}

Binary file not shown.

View File

@@ -0,0 +1,94 @@
{# ───── defaults ───── #}
{%- if enable_thinking is not defined -%}
{%- set enable_thinking = true -%}
{%- endif -%}
{# ───── reasoning mode ───── #}
{%- if enable_thinking -%}
{%- set reasoning_mode = "/think" -%}
{%- else -%}
{%- set reasoning_mode = "/no_think" -%}
{%- endif -%}
{# ───── header (system message) ───── #}
{{- "<|im_start|>system\n" -}}
{%- if messages[0].role == "system" -%}
{%- set system_message = messages[0].content -%}
{%- if "/no_think" in system_message -%}
{%- set reasoning_mode = "/no_think" -%}
{%- elif "/think" in system_message -%}
{%- set reasoning_mode = "/think" -%}
{%- endif -%}
{%- set custom_instructions = system_message.replace("/no_think", "").replace("/think", "").rstrip() -%}
{%- endif -%}
{%- if "/system_override" in system_message -%}
{{- custom_instructions.replace("/system_override", "").rstrip() -}}
{{- "<|im_end|>\n" -}}
{%- else -%}
{{- "## Metadata\n\n" -}}
{{- "Knowledge Cutoff Date: June 2025\n" -}}
{%- set today = strftime_now("%d %B %Y") -%}
{{- "Today Date: " ~ today ~ "\n" -}}
{{- "Reasoning Mode: " + reasoning_mode + "\n\n" -}}
{{- "## Custom Instructions\n\n" -}}
{%- if custom_instructions -%}
{{- custom_instructions + "\n\n" -}}
{%- elif reasoning_mode == "/think" -%}
{{- "You are a helpful AI assistant named SmolLM, trained by Hugging Face. Your role as an assistant involves thoroughly exploring questions through a systematic thinking process before providing the final precise and accurate solutions. This requires engaging in a comprehensive cycle of analysis, summarizing, exploration, reassessment, reflection, backtracking, and iteration to develop well-considered thinking process. Please structure your response into two main sections: Thought and Solution using the specified format: <think> Thought section </think> Solution section. In the Thought section, detail your reasoning process in steps. Each step should include detailed considerations such as analysing questions, summarizing relevant findings, brainstorming new ideas, verifying the accuracy of the current steps, refining any errors, and revisiting previous steps. In the Solution section, based on various attempts, explorations, and reflections from the Thought section, systematically present the final solution that you deem correct. The Solution section should be logical, accurate, and concise and detail necessary steps needed to reach the conclusion.\n\n" -}}
{%- else -%}
{{- "You are a helpful AI assistant named SmolLM, trained by Hugging Face.\n\n" -}}
{%- endif -%}
{%- if xml_tools or python_tools or tools -%}
{{- "### Tools\n\n" -}}
{%- if xml_tools or tools -%}
{%- if tools -%}
{%- set xml_tools = tools -%}
{%- endif -%}
{%- set ns = namespace(xml_tool_string="You may call one or more functions to assist with the user query.\nYou are provided with function signatures within <tools></tools> XML tags:\n\n<tools>\n") -%}
{%- for tool in xml_tools[:] -%} {# The slicing makes sure that xml_tools is a list #}
{%- set ns.xml_tool_string = ns.xml_tool_string ~ (tool | string) ~ "\n" -%}
{%- endfor -%}
{%- set xml_tool_string = ns.xml_tool_string + "</tools>\n\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\n<tool_call>\n{\"name\": <function-name>, \"arguments\": <args-json-object>}\n</tool_call>" -%}
{{- xml_tool_string -}}
{%- endif -%}
{%- if python_tools -%}
{%- set ns = namespace(python_tool_string="When you send a message containing Python code between '<code>' and '</code>' tags, it will be executed in a stateful Jupyter notebook environment, and you will then be given the output to continued reasoning in an agentic loop.\n\nYou can use the following tools in your python code like regular functions:\n<tools>\n") -%}
{%- for tool in python_tools[:] -%} {# The slicing makes sure that python_tools is a list #}
{%- set ns.python_tool_string = ns.python_tool_string ~ (tool | string) ~ "\n" -%}
{%- endfor -%}
{%- set python_tool_string = ns.python_tool_string + "</tools>\n\nThe state persists between code executions: so variables that you define in one step are still available thereafter." -%}
{{- python_tool_string -}}
{%- endif -%}
{{- "\n\n" -}}
{{- "<|im_end|>\n" -}}
{%- endif -%}
{%- endif -%}
{# ───── main loop ───── #}
{%- for message in messages -%}
{%- set content = message.content if message.content is string else "" -%}
{%- if message.role == "user" -%}
{{ "<|im_start|>" + message.role + "\n" + content + "<|im_end|>\n" }}
{%- elif message.role == "assistant" -%}
{% generation %}
{%- if reasoning_mode == "/think" -%}
{{ "<|im_start|>assistant\n" + content.lstrip("\n") + "<|im_end|>\n" }}
{%- else -%}
{{ "<|im_start|>assistant\n" + "<think>\n\n</think>\n" + content.lstrip("\n") + "<|im_end|>\n" }}
{%- endif -%}
{% endgeneration %}
{%- elif message.role == "tool" -%}
{{ "<|im_start|>" + "user\n" + content + "<|im_end|>\n" }}
{%- endif -%}
{%- endfor -%}
{# ───── generation prompt ───── #}
{%- if add_generation_prompt -%}
{%- if reasoning_mode == "/think" -%}
{{ "<|im_start|>assistant\n" }}
{%- else -%}
{{ "<|im_start|>assistant\n" + "<think>\n\n</think>\n" }}
{%- endif -%}
{%- endif -%}

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,15 @@
{
"backend": "tokenizers",
"bos_token": null,
"clean_up_tokenization_spaces": true,
"eos_token": "<|im_end|>",
"fast": false,
"is_local": false,
"model_input_names": [
"input_ids",
"attention_mask"
],
"model_max_length": 131072,
"pad_token": "<|im_end|>",
"tokenizer_class": "TokenizersBackend"
}

298
smol_tool_parser.py Normal file
View File

@@ -0,0 +1,298 @@
# SPDX-License-Identifier: Apache-2.0
# SPDX-FileCopyrightText: Copyright contributors to the vLLM project
# Whatever modifications are needed for the model https://huggingface.co/HuggingFaceTB/SmolLM3-3B
import json
from collections.abc import Sequence
import regex as re
from vllm.entrypoints.chat_utils import make_tool_call_id
from vllm.entrypoints.openai.chat_completion.protocol import (
ChatCompletionRequest,
)
from vllm.entrypoints.openai.engine.protocol import (
DeltaFunctionCall,
DeltaMessage,
DeltaToolCall,
ExtractedToolCallInformation,
FunctionCall,
ToolCall,
)
from vllm.entrypoints.openai.responses.protocol import ResponsesRequest
from vllm.logger import init_logger
from vllm.tokenizers import TokenizerLike
from vllm.tool_parsers.abstract_tool_parser import (
Tool,
ToolParser,
)
from vllm.utils.mistral import is_mistral_tokenizer
logger = init_logger(__name__)
def _partial_tag_overlap(text: str, tag: str) -> int:
"""Length of the longest prefix of `tag` that matches a suffix of `text`.
E.g. text ending in "<tool_" returns 6 when tag is "<tool_call>".
Returns 0 if there is no overlap.
"""
max_check = min(len(tag) - 1, len(text))
for k in range(max_check, 0, -1):
if text.endswith(tag[:k]):
return k
return 0
def _is_valid_json(text: str) -> bool:
try:
json.loads(text)
return True
except (json.JSONDecodeError, ValueError):
return False
class Hermes2ProToolParser(ToolParser):
def __init__(self, tokenizer: TokenizerLike, tools: list[Tool] | None = None):
super().__init__(tokenizer, tools)
if is_mistral_tokenizer(tokenizer):
logger.error("Detected Mistral tokenizer when using a Hermes model")
self.model_tokenizer = tokenizer.tokenizer
self.tool_call_start_token: str = "<tool_call>"
self.tool_call_end_token: str = "</tool_call>"
self.tool_call_regex = re.compile(
r"<tool_call>(.*?)</tool_call>|<tool_call>(.*)", re.DOTALL
)
self.scratch_pad_regex = re.compile(
r"<scratch_pad>(.*?)</scratch_pad>", re.DOTALL
)
if not self.model_tokenizer:
raise ValueError(
"The model tokenizer must be passed to the ToolParser "
"constructor during construction."
)
# Streaming state: what has been sent to the client.
self._sent_content_idx: int = 0
def adjust_request(
self, request: ChatCompletionRequest | ResponsesRequest
) -> ChatCompletionRequest | ResponsesRequest:
request = super().adjust_request(request)
if request.tools and request.tool_choice != "none":
# do not skip special tokens because the tool_call tokens are
# marked "special" in some models. Since they are skipped
# prior to the call to the tool parser, it breaks tool calling.
request.skip_special_tokens = False
return request
def extract_tool_calls(
self,
model_output: str,
request: ChatCompletionRequest,
) -> ExtractedToolCallInformation:
# sanity check; avoid unnecessary processing
if self.tool_call_start_token not in model_output:
return ExtractedToolCallInformation(
tools_called=False, tool_calls=[], content=model_output
)
else:
try:
# there are two possible captures - between tags, or between a
# tag and end-of-string so the result of
# findall is an array of tuples where one is a function call and
# the other is None
function_call_tuples = self.tool_call_regex.findall(model_output)
# load the JSON, and then use it to build the Function and
# Tool Call
raw_function_calls = [
json.loads(match[0] if match[0] else match[1])
for match in function_call_tuples
]
tool_calls = [
ToolCall(
type="function",
function=FunctionCall(
name=function_call["name"],
# function call args are JSON but as a string
arguments=json.dumps(
function_call["arguments"], ensure_ascii=False
),
),
)
for function_call in raw_function_calls
]
content = model_output[: model_output.find(self.tool_call_start_token)]
return ExtractedToolCallInformation(
tools_called=True,
tool_calls=tool_calls,
content=content if content else None,
)
except Exception:
logger.exception("Error in extracting tool call from response.")
return ExtractedToolCallInformation(
tools_called=False, tool_calls=[], content=model_output
)
def _extract_content(self, current_text: str) -> str | None:
"""Return unsent non-tool-call text, or None.
Holds back any suffix that could be a partial <tool_call> tag.
"""
if self.tool_call_start_token not in current_text:
overlap_length = _partial_tag_overlap(
current_text, self.tool_call_start_token
)
sendable_idx = len(current_text) - overlap_length
else:
sendable_idx = current_text.index(self.tool_call_start_token)
if sendable_idx > self._sent_content_idx:
content = current_text[self._sent_content_idx : sendable_idx]
self._sent_content_idx = sendable_idx
return content
return None
def _extract_tool_call_jsons(self, text: str) -> list[tuple[str, bool]]:
"""Extract (json_text, is_complete) for each <tool_call> region."""
results: list[tuple[str, bool]] = []
pos = 0
while True:
start = text.find(self.tool_call_start_token, pos)
if start == -1:
break
json_start = start + len(self.tool_call_start_token)
json_end = text.find(self.tool_call_end_token, json_start)
if json_end != -1:
results.append((text[json_start:json_end].strip(), True))
pos = json_end + len(self.tool_call_end_token)
else:
raw = text[json_start:]
# Strip partial </tool_call> suffix if present.
overlap = _partial_tag_overlap(raw, self.tool_call_end_token)
if overlap:
raw = raw[:-overlap]
tc_json = raw.strip()
# Valid JSON without closing tag = complete body,
# tag tokens just haven't arrived yet.
is_complete = _is_valid_json(tc_json) if tc_json else False
results.append((tc_json, is_complete))
break
return results
@staticmethod
def _extract_tool_name(tc_json: str) -> str | None:
"""Extract tool name, or None if the name isn't complete yet."""
match = re.search(r'"name"\s*:\s*"([^"]+)"', tc_json)
return match.group(1) if match else None
@staticmethod
def _extract_tool_args(tc_json: str, is_complete: bool) -> str | None:
"""Extract tool arguments from the tool call JSON.
Given {"name": "f", "arguments": {"x": 1}}, returns '{"x": 1}'.
When is_complete, strips the trailing '}' that closes the outer
object (not the arguments). For partial JSON, returns as-is.
"""
match = re.search(r'"arguments"\s*:\s*', tc_json)
if not match:
return None
raw = tc_json[match.end() :]
if is_complete:
raw = raw.rstrip()
if raw.endswith("}"):
raw = raw[:-1].rstrip()
return raw
def _compute_args_diff(
self, index: int, tc_json: str, is_complete: bool
) -> str | None:
"""Return new argument text not yet sent for tool `index`, or None."""
args = self._extract_tool_args(tc_json, is_complete)
if args is None or len(args) <= len(self.streamed_args_for_tool[index]):
return None
diff = args[len(self.streamed_args_for_tool[index]) :]
self.streamed_args_for_tool[index] = args
self.prev_tool_call_arr[index]["arguments"] = args
return diff
def extract_tool_calls_streaming(
self,
previous_text: str,
current_text: str,
delta_text: str,
previous_token_ids: Sequence[int],
current_token_ids: Sequence[int],
delta_token_ids: Sequence[int],
request: ChatCompletionRequest,
) -> DeltaMessage | None:
"""Incrementally stream tool call deltas from accumulated output.
On each invocation, re-parses the full ``current_text`` to find
``<tool_call>`` regions, then diffs against previously sent state
to emit only new content, tool names, or argument fragments.
Returns a ``DeltaMessage`` containing either plain content (for
text preceding any tool call) or one or more ``DeltaToolCall``
entries, or ``None`` if there is nothing new to send yet."""
try:
# Extract any content before tool calls.
content = self._extract_content(current_text)
tool_call_jsons = self._extract_tool_call_jsons(current_text)
tool_call_deltas: list[DeltaToolCall] = []
for i, (tc_json, is_complete) in enumerate(tool_call_jsons):
if i >= len(self.prev_tool_call_arr):
self.prev_tool_call_arr.append({})
self.streamed_args_for_tool.append("")
# Stream back tool name.
if "name" not in self.prev_tool_call_arr[i]:
name = self._extract_tool_name(tc_json)
if not name:
# Can't skip to tool i+1 if i isn't ready
break
self.prev_tool_call_arr[i]["name"] = name
tool_call_deltas.append(
DeltaToolCall(
index=i,
type="function",
id=make_tool_call_id(),
function=DeltaFunctionCall(name=name).model_dump(
exclude_none=True
),
)
)
# Stream back new tool args by diffing against what was sent.
args_diff = self._compute_args_diff(i, tc_json, is_complete)
if args_diff:
tool_call_deltas.append(
DeltaToolCall(
index=i,
function=DeltaFunctionCall(arguments=args_diff).model_dump(
exclude_none=True
),
)
)
if content or tool_call_deltas:
return DeltaMessage(
content=content,
tool_calls=tool_call_deltas,
)
return None
except Exception:
logger.exception("Error trying to handle streaming tool call.")
return None