init commit

2026-04-10 13:55:43 +00:00
commit 5029ab3b40
12 changed files with 1251979 additions and 0 deletions
--- a/README.md
+++ b/README.md
@@ -0,0 +1,80 @@
+# SmolLM3-3B Tool Call Fix — Notes
+
+## Status: SOLVED ✅
+
+All three template bugs fixed, reasoning parser working, tool calling functional.
+
+## What Was Fixed
+
+### Bug 1: Tool responses rendered as plain user messages
+Tool responses showed up as `<|im_start|>user\n...` — model couldn't distinguish them from new user turns and kept re-calling tools. Fixed by wrapping tool responses with the model's dedicated `tool_response_start`/`tool_response_end` tokens (128013/128014).
+
+### Bug 2: Assistant tool_calls not rendered in history
+When assistant message had `tool_calls`, the template only rendered `content` and dropped the tool call array. Model never saw its own prior invocations. Fixed by rendering tool calls using `tool_call_start`/`tool_call_end` tokens (128015/128016).
+
+### Bug 3: Thinking mode direction swapped
+`/think` mode produced bare assistant prompt (no think tags), `/no_think` wrapped in think tags. Completely backwards. Fixed: `/think` opens `...` tags, `/no_think` is plain text.
+
+## Special Tokens
+
+| Token ID | Text | Purpose |
+|----------|------|---------|
+| 128002 | `...` | Tool call start |
+| 128016 | `...` | Tool call end |
+
+## Patched Files (in model-files/)
+
+### `chat_template.jinja` — Fixed template
+Three fixes applied:
+1. Tool responses wrapped in `tool_response_start`/`tool_response_end` tokens
+2. Assistant tool_calls rendered in `tool_call_start`/`tool_call_end` format
+3. Thinking mode direction corrected
+
+Uses Jinja2 `~` operator (not `+`) to avoid type errors when `message.content` is None.
+
+### `gen_template.py` — Template generator
+Regenerates `chat_template.jinja` inside the container where the tokenizer is available. Required because the special tokens are Unicode private-use-area characters that can't be typed in editors.
+
+### `smol_tool_parser.py` — Tool call parser is just the unchanged hermes_tool_parser.py in case we need to change it
+The stock vLLM Hermes parser works as-is for parsing `...` blocks. No patches needed.
+
+## Reasoning Parser — NOT PATCHED
+
+The built-in `deepseek_r1` reasoning parser in vLLM works with SmolLM3 out of the box — they share the same `...` tokens. Verified by diffing the container's copy against the vllm source: identical, no patches needed.
+
+## Deploying
+
+1. Generate template inside the container:
+   ```bash
+   docker cp model-files/gen_template.py smol-vllm-1:/tmp/
+   docker exec smol-vllm-1 python3 /tmp/gen_template.py
+   ```
+
+2. Copy to mounted volume and restart:
+   ```bash
+   docker cp smol-vllm-1:/root/chat_template.jinja /root/smol/chat_template.jinja
+   cd /root/smol && docker compose restart
+   ```
+
+3. Required vLLM flags:
+   ```
+   --chat-template=/root/chat_template.jinja
+   --enable-auto-tool-choice
+   --tool-call-parser=hermes
+   --reasoning-parser=deepseek_r1
+   --chat-template-content-format=string
+   ```
+
+## Test Results
+
+- ✅ Tool response tests: All PASS (streaming + non-streaming)
+- ✅ Streaming tool calls: Incremental, 325+ chunks
+- ✅ Reasoning parser: Correctly splits thinking/content
+- ✅ Multi-turn tool use: Model reads results, answers properly
+- ⚠️ 3B model doesn't reliably choose tools over free-text for complex tasks (writes code as content instead of calling write_file). This is a model capability gap, not a parsing issue. Planned LoRA to address.
+
+## Next Steps
+
+- **LoRA training** to make tool calling more reliable (especially forced tool use scenarios)
+- Candidate dataset: `interstellarninja/tool-calls-multiturn`
+- Also worth considering: `NousResearch/Hermes-Function-Calling-V1`, `Salesforce/xLAM-function-calling-60k`