136 lines
4.3 KiB
Markdown
136 lines
4.3 KiB
Markdown
# vLLM GLM-5.x Tool Calling Patches
|
|
|
|
Fixes two critical bugs that prevent GLM models from working correctly with OpenAI-compatible tool calling in vLLM.
|
|
|
|
## Summary
|
|
|
|
GLM-5.x models would either crash or silently drop tool response content when using the OpenAI chat completions API with tools. Two separate bugs were responsible:
|
|
|
|
1. **Tool parser regex mismatch** — Parser expected newline between function name and arguments, but GLM's template does not include one
|
|
2. **Content format detection failure** — vLLM auto-detected "openai" format incorrectly, causing tool response content to be dropped
|
|
|
|
---
|
|
|
|
## Bug #1: Tool Parser Regex Mismatch
|
|
|
|
### Problem
|
|
|
|
The `func_detail_regex` in `glm4_moe_tool_parser.py` required a literal newline between the function name and the first argument tag.
|
|
|
|
GLM-5.x chat template outputs tool calls without that newline - the function name is immediately followed by the first argument tag. The regex would fail to match, causing tool call extraction to fail silently.
|
|
|
|
### Fix
|
|
|
|
Changed the regex to use `\\s*` (optional whitespace) instead of mandatory `\\n`, and made the arguments group optional for zero-argument calls:
|
|
|
|
```python
|
|
# Before
|
|
r"\[TOOL_START\]([^\n]*)\n(.*)\[TOOL_END\]"
|
|
|
|
# After
|
|
r"\[TOOL_START\]\s*([\w.\-]+)\s*((?:\[ARG_KEY\].*)?)\s*\[TOOL_END\]"
|
|
```
|
|
|
|
Also fixed `tc_args_raw` to default to empty string, preventing crashes on zero-argument tool calls.
|
|
|
|
**File:** `glm4_moe_tool_parser.py`
|
|
|
|
---
|
|
|
|
## Bug #2: Content Format Detection Failure
|
|
|
|
### Problem
|
|
|
|
vLLM's `_detect_content_format()` function analyzes Jinja templates to determine whether message content should be formatted as strings or OpenAI-style arrays.
|
|
|
|
For GLM-5.x, the template contains a loop `{% for tr in m.content %}` for handling tool responses with multiple results. vLLM saw this loop and detected "openai" format, converting tool message content to:
|
|
|
|
```json
|
|
[{"type": "text", "text": "the actual content"}]
|
|
```
|
|
|
|
However, the GLM template's first branch checks `{% if m.content is string %}` before using that loop. Since arrays are not strings, the template took the wrong branch and the content was lost.
|
|
|
|
The model would respond: *"The function returned no output"* even though valid content was provided.
|
|
|
|
### Root Cause
|
|
|
|
The template has two branches for tool messages:
|
|
|
|
```jinja
|
|
{%- if m.content is string %}
|
|
{{ '<observations>' + m.content + '</observations>' }}
|
|
{%- else %}
|
|
{% for tr in m.content %} <!-- expects objects with .name property -->
|
|
...
|
|
{% endif %}
|
|
```
|
|
|
|
vLLM's detection saw the `for` loop and chose "openai" format. But the `is string` check failed for arrays, and the `else` branch expected objects with `.name` properties that `{"type": "text"}` objects don't have.
|
|
|
|
### Fix
|
|
|
|
Added `_is_glm_model()` detection function to `vllm/renderers/hf.py` that forces "string" content format for GLM models, bypassing the incorrect auto-detection:
|
|
|
|
```python
|
|
def _is_glm_model(tokenizer: HfTokenizer, model_config: "ModelConfig") -> bool:
|
|
"""Check if this is a GLM model that requires string content format."""
|
|
name_or_path = tokenizer.name_or_path.lower()
|
|
glm_indicators = ["glm-4", "glm-5", "glm4", "glm5", "zai-org/glm"]
|
|
return any(ind in name_or_path for ind in glm_indicators)
|
|
```
|
|
|
|
Called in `_resolve_chat_template_content_format()` before auto-detection.
|
|
|
|
**File:** `vllm_patches/hf.py`
|
|
|
|
---
|
|
|
|
## Files
|
|
|
|
| File | Description |
|
|
|------|-------------|
|
|
| `glm4_moe_tool_parser.py` | Fixed tool parser (regex fix) |
|
|
| `utils.py` | Utility functions for partial JSON/tag handling |
|
|
| `vllm_patches/hf.py` | Patched renderer (content format fix) |
|
|
| `Dockerfile` | Overlays patched files onto base vLLM image |
|
|
|
|
---
|
|
|
|
## Deployment
|
|
|
|
### Docker Build
|
|
|
|
```bash
|
|
docker build -t your-registry/vllm-glm51-patched:latest .
|
|
docker push your-registry/vllm-glm51-patched:latest
|
|
```
|
|
|
|
### Kubernetes
|
|
|
|
Update your deployment to use the patched image and ensure these vLLM args:
|
|
|
|
```yaml
|
|
extraArgs:
|
|
- "--tool-call-parser=glm47"
|
|
- "--enable-auto-tool-choice"
|
|
```
|
|
|
|
---
|
|
|
|
## Verification
|
|
|
|
Tool response content is now properly passed to the model:
|
|
|
|
```
|
|
Model response: The test function was called successfully! It returned the value **42**.
|
|
PASS: Model referenced the tool result (42)
|
|
```
|
|
|
|
---
|
|
|
|
## Related
|
|
|
|
- vLLM Issue #32829 (streaming long string parameters)
|
|
- GLM-5.1 chat template: https://huggingface.co/zai-org/GLM-5.1-FP8/raw/main/chat_template.jinja
|