vLLM GLM-5.x Tool Calling Patches

Fixes two critical bugs that prevent GLM models from working correctly with OpenAI-compatible tool calling in vLLM.

Summary

GLM-5.x models would either crash or silently drop tool response content when using the OpenAI chat completions API with tools. Two separate bugs were responsible:

  1. Tool parser regex mismatch — Parser expected newline between function name and arguments, but GLM's template does not include one
  2. Content format detection failure — vLLM auto-detected "openai" format incorrectly, causing tool response content to be dropped

Bug #1: Tool Parser Regex Mismatch

Problem

The func_detail_regex in glm4_moe_tool_parser.py required a literal newline between the function name and the first argument tag.

GLM-5.x chat template outputs tool calls without that newline - the function name is immediately followed by the first argument tag. The regex would fail to match, causing tool call extraction to fail silently.

Fix

Changed the regex to use \\s* (optional whitespace) instead of mandatory \\n, and made the arguments group optional for zero-argument calls:

# Before
r"\[TOOL_START\]([^\n]*)\n(.*)\[TOOL_END\]"

# After  
r"\[TOOL_START\]\s*([\w.\-]+)\s*((?:\[ARG_KEY\].*)?)\s*\[TOOL_END\]"

Also fixed tc_args_raw to default to empty string, preventing crashes on zero-argument tool calls.

File: glm4_moe_tool_parser.py


Bug #2: Content Format Detection Failure

Problem

vLLM's _detect_content_format() function analyzes Jinja templates to determine whether message content should be formatted as strings or OpenAI-style arrays.

For GLM-5.x, the template contains a loop {% for tr in m.content %} for handling tool responses with multiple results. vLLM saw this loop and detected "openai" format, converting tool message content to:

[{"type": "text", "text": "the actual content"}]

However, the GLM template's first branch checks {% if m.content is string %} before using that loop. Since arrays are not strings, the template took the wrong branch and the content was lost.

The model would respond: "The function returned no output" even though valid content was provided.

Root Cause

The template has two branches for tool messages:

{%- if m.content is string %}
    {{ '<observations>' + m.content + '</observations>' }}
{%- else %}
    {% for tr in m.content %}  <!-- expects objects with .name property -->
    ...
{% endif %}

vLLM's detection saw the for loop and chose "openai" format. But the is string check failed for arrays, and the else branch expected objects with .name properties that {"type": "text"} objects don't have.

Fix

Added _is_glm_model() detection function to vllm/renderers/hf.py that forces "string" content format for GLM models, bypassing the incorrect auto-detection:

def _is_glm_model(tokenizer: HfTokenizer, model_config: "ModelConfig") -> bool:
    """Check if this is a GLM model that requires string content format."""
    name_or_path = tokenizer.name_or_path.lower()
    glm_indicators = ["glm-4", "glm-5", "glm4", "glm5", "zai-org/glm"]
    return any(ind in name_or_path for ind in glm_indicators)

Called in _resolve_chat_template_content_format() before auto-detection.

File: vllm_patches/hf.py


Files

File Description
glm4_moe_tool_parser.py Fixed tool parser (regex fix)
utils.py Utility functions for partial JSON/tag handling
vllm_patches/hf.py Patched renderer (content format fix)
Dockerfile Overlays patched files onto base vLLM image

Deployment

Docker Build

docker build -t your-registry/vllm-glm51-patched:latest .
docker push your-registry/vllm-glm51-patched:latest

Kubernetes

Update your deployment to use the patched image and ensure these vLLM args:

extraArgs:
  - "--tool-call-parser=glm47"
  - "--enable-auto-tool-choice"

Verification

Tool response content is now properly passed to the model:

Model response: The test function was called successfully! It returned the value **42**.
PASS: Model referenced the tool result (42)

Description
No description provided
Readme 83 KiB
Languages
Python 98.4%
Dockerfile 1.1%
Shell 0.5%