vLLM GLM Tool Parser Patch
Patches vLLM's GLM-4/GLM-5.1 tool parser to fix multiple issues with tool call handling.
Issues Fixed
Issue 1: Tool Response Content Ignored (CRITICAL)
Symptom: When the model makes a tool call and receives a response, it would act as if the response was empty ("The function returned no output") even though valid content was provided.
Root Cause: The func_detail_regex required a newline between the function name and first argument tag, but GLM-5.1's chat template does NOT include that newline. The regex silently failed to match, tool call extraction failed, and somewhere in that failure path the tool response content got lost.
Model output format (no newline after name):
[TOOL_CALL_START]function_name[ARG_KEY]value[ARG_END]...[TOOL_CALL_END]
Old regex (broken):
r"\[TOOL_CALL_START\]([^\n]*)\n(.*)\[TOOL_CALL_END\]" # Requires \n after name
Fixed regex:
r"\[TOOL_CALL_START\]\s*([\w.\-]+)\s*((?:\[ARG_KEY\].*)?)\s*\[TOOL_CALL_END\]"
The fix:
- Uses
\s*instead of mandatory\n - Makes the arguments group optional for zero-argument calls
- Accepts word chars, dots, and hyphens in function names
Issue 2: Zero-Argument Tool Calls Crash
Symptom: TypeError: 'NoneType' object is not iterable when tool has no arguments.
Fix: The tc_args_raw is now defaulted to empty string: tc_args_raw = tc_detail.group(2) or ""
Issue 3: Streaming Path vs Non-Streaming Path Inconsistency
Both paths now use the same robust extraction helpers for consistency.
Files
| File | Description |
|---|---|
glm4_moe_tool_parser.py |
Fixed tool parser |
utils.py |
Utility functions for partial JSON/tag handling |
Dockerfile |
Overlays patched files onto base image |
Jenkinsfile |
CI/CD pipeline for building and pushing |
tests/ |
Test suite for tool call validation |
Testing
Requirements
pip install httpx regex
Running Tests
export VLLM_API_BASE="https://api.vultrinference.com/v1"
export VLLM_API_KEY="your-api-key"
export VLLM_MODEL="zai-org/GLM-5.1-FP8"
python tests/test_tool_diagnosis.py
Test Cases
| Test | Description |
|---|---|
test_simple_tool_response |
Verifies model can see tool response content |
test_without_tools_param |
Tests behavior without tools param in follow-up |
test_different_content_formats |
String vs array content formats |
Deployment
Jenkins Pipeline
curl -X POST "https://jenkins.sweetapi.com/job/vllm-glm-build/buildWithParameters" \
-u "admin:TOKEN" \
-d "IMAGE_TAG=latest"
Manual Build
docker build -t atl.vultrcr.com/vllm/vllm-glm51-patched:latest .
docker push atl.vultrcr.com/vllm/vllm-glm51-patched:latest
Images
- Base:
vllm/vllm-openai:glm51-cu130 - Output:
atl.vultrcr.com/vllm/vllm-glm51-patched:<tag>
Related
- vLLM Issue #32829 (streaming long string parameters)
- GLM-5.1 chat template: https://huggingface.co/zai-org/GLM-5.1-FP8/raw/main/chat_template.jinja