- Tool response content was being dropped because vLLM detected 'openai' content format incorrectly for GLM templates - Added _is_glm_model() detection to force 'string' format - Updated Dockerfile to include hf.py patch - Added debug tests for tool visibility
3.4 KiB
vLLM GLM Tool Parser Patch
Patches vLLM's GLM-4/GLM-5.1 tool parser to fix multiple issues with tool call handling.
Issues Fixed
Issue 1: Tool Response Content Ignored (CRITICAL)
Symptom: When the model makes a tool call and receives a response, it would act as if the response was empty ("The function returned no output") even though valid content was provided.
Root Cause: Two bugs working together:
-
Tool parser regex mismatch (
glm4_moe_tool_parser.py): Thefunc_detail_regexrequired a newline between the function name and first argument tag, but GLM-5.1's chat template doesn't include that newline. The regex silently failed to match. -
Content format detection wrong (
vllm/renderers/hf.py): vLLM detected "openai" content format because the GLM template has{% for tr in m.content %}for tool responses. But the template then checksm.content is stringwhich is False for OpenAI format arrays, causing content to be dropped.
Model output format (no newline after name):
[TOOL_CALL_START]function_name[ARG_KEY]value[ARG_END]...[TOOL_CALL_END]
Old regex (broken):
r"\[TOOL_CALL_START\]([^\n]*)\n(.*)\[TOOL_CALL_END\]" # Requires \n after name
Fixed regex:
r"\[TOOL_CALL_START\]\s*([\w.\-]+)\s*((?:\[ARG_KEY\].*)?)\s*\[TOOL_CALL_END\]"
Content format fix:
Added _is_glm_model() detection to force "string" content format for GLM models, bypassing the incorrect auto-detection.
Issue 2: Zero-Argument Tool Calls Crash
Symptom: TypeError: 'NoneType' object is not iterable when tool has no arguments.
Fix: The tc_args_raw is now defaulted to empty string: tc_args_raw = tc_detail.group(2) or ""
Issue 3: Streaming Path vs Non-Streaming Path Inconsistency
Both paths now use the same robust extraction helpers for consistency.
Files
| File | Description |
|---|---|
glm4_moe_tool_parser.py |
Fixed tool parser (regex fix) |
utils.py |
Utility functions for partial JSON/tag handling |
vllm_patches/hf.py |
Patched renderer (content format fix) |
Dockerfile |
Overlays patched files onto base image |
Jenkinsfile |
CI/CD pipeline for building and pushing |
tests/ |
Test suite for tool call validation |
Testing
Requirements
pip install httpx regex
Running Tests
export VLLM_API_BASE="https://api.vultrinference.com/v1"
export VLLM_API_KEY="your-api-key"
export VLLM_MODEL="zai-org/GLM-5.1-FP8"
python tests/test_tool_diagnosis.py
Test Cases
| Test | Description |
|---|---|
test_simple_tool_response |
Verifies model can see tool response content |
test_without_tools_param |
Tests behavior without tools param in follow-up |
test_different_content_formats |
String vs array content formats |
Deployment
Jenkins Pipeline
curl -X POST "https://jenkins.sweetapi.com/job/vllm-glm-build/buildWithParameters" \
-u "admin:TOKEN" \
-d "IMAGE_TAG=latest"
Manual Build
docker build -t atl.vultrcr.com/vllm/vllm-glm51-patched:latest .
docker push atl.vultrcr.com/vllm/vllm-glm51-patched:latest
Images
- Base:
vllm/vllm-openai:glm51-cu130 - Output:
atl.vultrcr.com/vllm/vllm-glm51-patched:<tag>
Related
- vLLM Issue #32829 (streaming long string parameters)
- GLM-5.1 chat template: https://huggingface.co/zai-org/GLM-5.1-FP8/raw/main/chat_template.jinja