# vLLM GLM Tool Parser Patch Patches vLLM's GLM-4/GLM-5.1 tool parser to fix multiple issues with tool call handling. ## Issues Fixed ### Issue 1: Tool Response Content Ignored (CRITICAL) **Symptom:** When the model makes a tool call and receives a response, it would act as if the response was empty ("The function returned no output") even though valid content was provided. **Root Cause:** Two bugs working together: 1. **Tool parser regex mismatch** (`glm4_moe_tool_parser.py`): The `func_detail_regex` required a newline between the function name and first argument tag, but GLM-5.1's chat template doesn't include that newline. The regex silently failed to match. 2. **Content format detection wrong** (`vllm/renderers/hf.py`): vLLM detected "openai" content format because the GLM template has `{% for tr in m.content %}` for tool responses. But the template then checks `m.content is string` which is False for OpenAI format arrays, causing content to be dropped. **Model output format (no newline after name):** ``` [TOOL_CALL_START]function_name[ARG_KEY]value[ARG_END]...[TOOL_CALL_END] ``` **Old regex (broken):** ```python r"\[TOOL_CALL_START\]([^\n]*)\n(.*)\[TOOL_CALL_END\]" # Requires \n after name ``` **Fixed regex:** ```python r"\[TOOL_CALL_START\]\s*([\w.\-]+)\s*((?:\[ARG_KEY\].*)?)\s*\[TOOL_CALL_END\]" ``` **Content format fix:** Added `_is_glm_model()` detection to force "string" content format for GLM models, bypassing the incorrect auto-detection. ### Issue 2: Zero-Argument Tool Calls Crash **Symptom:** `TypeError: 'NoneType' object is not iterable` when tool has no arguments. **Fix:** The `tc_args_raw` is now defaulted to empty string: `tc_args_raw = tc_detail.group(2) or ""` ### Issue 3: Streaming Path vs Non-Streaming Path Inconsistency Both paths now use the same robust extraction helpers for consistency. ## Files | File | Description | |------|-------------| | `glm4_moe_tool_parser.py` | Fixed tool parser (regex fix) | | `utils.py` | Utility functions for partial JSON/tag handling | | `vllm_patches/hf.py` | Patched renderer (content format fix) | | `Dockerfile` | Overlays patched files onto base image | | `Jenkinsfile` | CI/CD pipeline for building and pushing | | `tests/` | Test suite for tool call validation | ## Testing ### Requirements ```bash pip install httpx regex ``` ### Running Tests ```bash export VLLM_API_BASE="https://api.vultrinference.com/v1" export VLLM_API_KEY="your-api-key" export VLLM_MODEL="zai-org/GLM-5.1-FP8" python tests/test_tool_diagnosis.py ``` ### Test Cases | Test | Description | |------|-------------| | `test_simple_tool_response` | Verifies model can see tool response content | | `test_without_tools_param` | Tests behavior without tools param in follow-up | | `test_different_content_formats` | String vs array content formats | ## Deployment ### Jenkins Pipeline ```bash curl -X POST "https://jenkins.sweetapi.com/job/vllm-glm-build/buildWithParameters" \ -u "admin:TOKEN" \ -d "IMAGE_TAG=latest" ``` ### Manual Build ```bash docker build -t atl.vultrcr.com/vllm/vllm-glm51-patched:latest . docker push atl.vultrcr.com/vllm/vllm-glm51-patched:latest ``` ### Images - Base: `vllm/vllm-openai:glm51-cu130` - Output: `atl.vultrcr.com/vllm/vllm-glm51-patched:` ## Related - vLLM Issue #32829 (streaming long string parameters) - GLM-5.1 chat template: https://huggingface.co/zai-org/GLM-5.1-FP8/raw/main/chat_template.jinja