2026-04-09 04:28:22 +00:00
2026-04-09 04:28:22 +00:00
2026-04-09 04:28:22 +00:00
2026-04-09 04:28:22 +00:00
2026-04-09 04:28:22 +00:00

vLLM GLM Tool Parser Patch

Patches vLLM's GLM-4/GLM-5.1 tool parser to fix multiple issues with tool call handling.

Issues Fixed

Issue 1: Tool Response Content Ignored (CRITICAL)

Symptom: When the model makes a tool call and receives a response, it would act as if the response was empty ("The function returned no output") even though valid content was provided.

Root Cause: The func_detail_regex required a newline between the function name and first argument tag, but GLM-5.1's chat template does NOT include that newline. The regex silently failed to match, tool call extraction failed, and somewhere in that failure path the tool response content got lost.

Model output format (no newline after name):

[TOOL_CALL_START]function_name[ARG_KEY]value[ARG_END]...[TOOL_CALL_END]

Old regex (broken):

r"\[TOOL_CALL_START\]([^\n]*)\n(.*)\[TOOL_CALL_END\]"  # Requires \n after name

Fixed regex:

r"\[TOOL_CALL_START\]\s*([\w.\-]+)\s*((?:\[ARG_KEY\].*)?)\s*\[TOOL_CALL_END\]"

The fix:

  • Uses \s* instead of mandatory \n
  • Makes the arguments group optional for zero-argument calls
  • Accepts word chars, dots, and hyphens in function names

Issue 2: Zero-Argument Tool Calls Crash

Symptom: TypeError: 'NoneType' object is not iterable when tool has no arguments.

Fix: The tc_args_raw is now defaulted to empty string: tc_args_raw = tc_detail.group(2) or ""

Issue 3: Streaming Path vs Non-Streaming Path Inconsistency

Both paths now use the same robust extraction helpers for consistency.

Files

File Description
glm4_moe_tool_parser.py Fixed tool parser
utils.py Utility functions for partial JSON/tag handling
Dockerfile Overlays patched files onto base image
Jenkinsfile CI/CD pipeline for building and pushing
tests/ Test suite for tool call validation

Testing

Requirements

pip install httpx regex

Running Tests

export VLLM_API_BASE="https://api.vultrinference.com/v1"
export VLLM_API_KEY="your-api-key"
export VLLM_MODEL="zai-org/GLM-5.1-FP8"

python tests/test_tool_diagnosis.py

Test Cases

Test Description
test_simple_tool_response Verifies model can see tool response content
test_without_tools_param Tests behavior without tools param in follow-up
test_different_content_formats String vs array content formats

Deployment

Jenkins Pipeline

curl -X POST "https://jenkins.sweetapi.com/job/vllm-glm-build/buildWithParameters" \
  -u "admin:TOKEN" \
  -d "IMAGE_TAG=latest"

Manual Build

docker build -t atl.vultrcr.com/vllm/vllm-glm51-patched:latest .
docker push atl.vultrcr.com/vllm/vllm-glm51-patched:latest

Images

  • Base: vllm/vllm-openai:glm51-cu130
  • Output: atl.vultrcr.com/vllm/vllm-glm51-patched:<tag>
Description
No description provided
Readme 83 KiB
Languages
Python 98.4%
Dockerfile 1.1%
Shell 0.5%