README.md

# vLLM GLM Tool Parser Patch

Patches vLLM's GLM-4/GLM-5.1 tool parser to fix multiple issues with tool call handling.

## Issues Fixed

### Issue 1: Tool Response Content Ignored (CRITICAL)

**Symptom:** When the model makes a tool call and receives a response, it would act as if the response was empty ("The function returned no output") even though valid content was provided.

**Root Cause:** Two bugs working together:

1. **Tool parser regex mismatch** (`glm4_moe_tool_parser.py`): The `func_detail_regex` required a newline between the function name and first argument tag, but GLM-5.1's chat template doesn't include that newline. The regex silently failed to match.

2. **Content format detection wrong** (`vllm/renderers/hf.py`): vLLM detected "openai" content format because the GLM template has `{% for tr in m.content %}` for tool responses. But the template then checks `m.content is string` which is False for OpenAI format arrays, causing content to be dropped.

**Model output format (no newline after name):**
```
[TOOL_CALL_START]function_name[ARG_KEY]value[ARG_END]...[TOOL_CALL_END]
```

**Old regex (broken):**
```python
r"\[TOOL_CALL_START\]([^\n]*)\n(.*)\[TOOL_CALL_END\]"  # Requires \n after name
```

**Fixed regex:**
```python
r"\[TOOL_CALL_START\]\s*([\w.\-]+)\s*((?:\[ARG_KEY\].*)?)\s*\[TOOL_CALL_END\]"
```

**Content format fix:**
Added `_is_glm_model()` detection to force "string" content format for GLM models, bypassing the incorrect auto-detection.

### Issue 2: Zero-Argument Tool Calls Crash

**Symptom:** `TypeError: 'NoneType' object is not iterable` when tool has no arguments.

**Fix:** The `tc_args_raw` is now defaulted to empty string: `tc_args_raw = tc_detail.group(2) or ""`

### Issue 3: Streaming Path vs Non-Streaming Path Inconsistency

Both paths now use the same robust extraction helpers for consistency.

## Files

| File | Description |
|------|-------------|
| `glm4_moe_tool_parser.py` | Fixed tool parser (regex fix) |
| `utils.py` | Utility functions for partial JSON/tag handling |
| `vllm_patches/hf.py` | Patched renderer (content format fix) |
| `Dockerfile` | Overlays patched files onto base image |
| `Jenkinsfile` | CI/CD pipeline for building and pushing |
| `tests/` | Test suite for tool call validation |

## Testing

### Requirements

```bash
pip install httpx regex
```

### Running Tests

```bash
export VLLM_API_BASE="https://api.vultrinference.com/v1"
export VLLM_API_KEY="your-api-key"
export VLLM_MODEL="zai-org/GLM-5.1-FP8"

python tests/test_tool_diagnosis.py
```

### Test Cases

| Test | Description |
|------|-------------|
| `test_simple_tool_response` | Verifies model can see tool response content |
| `test_without_tools_param` | Tests behavior without tools param in follow-up |
| `test_different_content_formats` | String vs array content formats |

## Deployment

### Jenkins Pipeline

```bash
curl -X POST "https://jenkins.sweetapi.com/job/vllm-glm-build/buildWithParameters" \
  -u "admin:TOKEN" \
  -d "IMAGE_TAG=latest"
```

### Manual Build

```bash
docker build -t atl.vultrcr.com/vllm/vllm-glm51-patched:latest .
docker push atl.vultrcr.com/vllm/vllm-glm51-patched:latest
```

### Images

- Base: `vllm/vllm-openai:glm51-cu130`
- Output: `atl.vultrcr.com/vllm/vllm-glm51-patched:<tag>`

## Related

- vLLM Issue #32829 (streaming long string parameters)
- GLM-5.1 chat template: https://huggingface.co/zai-org/GLM-5.1-FP8/raw/main/chat_template.jinja
GLM-5.1 tool parser with incremental streaming support 2026-04-08 18:23:12 +00:00			`# vLLM GLM Tool Parser Patch`

patch parser 2026-04-09 04:28:22 +00:00			`Patches vLLM's GLM-4/GLM-5.1 tool parser to fix multiple issues with tool call handling.`
GLM-5.1 tool parser with incremental streaming support 2026-04-08 18:23:12 +00:00
patch parser 2026-04-09 04:28:22 +00:00			`## Issues Fixed`
GLM-5.1 tool parser with incremental streaming support 2026-04-08 18:23:12 +00:00
patch parser 2026-04-09 04:28:22 +00:00			`### Issue 1: Tool Response Content Ignored (CRITICAL)`
GLM-5.1 tool parser with incremental streaming support 2026-04-08 18:23:12 +00:00
patch parser 2026-04-09 04:28:22 +00:00			`Symptom: When the model makes a tool call and receives a response, it would act as if the response was empty ("The function returned no output") even though valid content was provided.`
GLM-5.1 tool parser with incremental streaming support 2026-04-08 18:23:12 +00:00
Add hf.py patch to force string content format for GLM models - Tool response content was being dropped because vLLM detected 'openai' content format incorrectly for GLM templates - Added _is_glm_model() detection to force 'string' format - Updated Dockerfile to include hf.py patch - Added debug tests for tool visibility 2026-04-09 05:20:47 +00:00			`Root Cause: Two bugs working together:`

			1. Tool parser regex mismatch (`glm4_moe_tool_parser.py`): The `func_detail_regex` required a newline between the function name and first argument tag, but GLM-5.1's chat template doesn't include that newline. The regex silently failed to match.

			2. Content format detection wrong (`vllm/renderers/hf.py`): vLLM detected "openai" content format because the GLM template has `{% for tr in m.content %}` for tool responses. But the template then checks `m.content is string` which is False for OpenAI format arrays, causing content to be dropped.
patch parser 2026-04-09 04:28:22 +00:00
			`Model output format (no newline after name):`
			```
			`[TOOL_CALL_START]function_name[ARG_KEY]value[ARG_END]...[TOOL_CALL_END]`
			```

			`Old regex (broken):`
			```python
			`r"\[TOOL_CALL_START\]([^\n])\n(.)\[TOOL_CALL_END\]" # Requires \n after name`
GLM-5.1 tool parser with incremental streaming support 2026-04-08 18:23:12 +00:00			```
patch parser 2026-04-09 04:28:22 +00:00
			`Fixed regex:`
			```python
			`r"\[TOOL_CALL_START\]\s([\w.\-]+)\s((?:\[ARG_KEY\].)?)\s\[TOOL_CALL_END\]"`
GLM-5.1 tool parser with incremental streaming support 2026-04-08 18:23:12 +00:00			```

Add hf.py patch to force string content format for GLM models - Tool response content was being dropped because vLLM detected 'openai' content format incorrectly for GLM templates - Added _is_glm_model() detection to force 'string' format - Updated Dockerfile to include hf.py patch - Added debug tests for tool visibility 2026-04-09 05:20:47 +00:00			`Content format fix:`
			Added `_is_glm_model()` detection to force "string" content format for GLM models, bypassing the incorrect auto-detection.
patch parser 2026-04-09 04:28:22 +00:00
			`### Issue 2: Zero-Argument Tool Calls Crash`
GLM-5.1 tool parser with incremental streaming support 2026-04-08 18:23:12 +00:00
patch parser 2026-04-09 04:28:22 +00:00			Symptom: `TypeError: 'NoneType' object is not iterable` when tool has no arguments.
GLM-5.1 tool parser with incremental streaming support 2026-04-08 18:23:12 +00:00
patch parser 2026-04-09 04:28:22 +00:00			Fix: The `tc_args_raw` is now defaulted to empty string: `tc_args_raw = tc_detail.group(2) or ""`
GLM-5.1 tool parser with incremental streaming support 2026-04-08 18:23:12 +00:00
patch parser 2026-04-09 04:28:22 +00:00			`### Issue 3: Streaming Path vs Non-Streaming Path Inconsistency`

			`Both paths now use the same robust extraction helpers for consistency.`
GLM-5.1 tool parser with incremental streaming support 2026-04-08 18:23:12 +00:00
			`## Files`

			`\| File \| Description \|`
			`\|------\|-------------\|`
Add hf.py patch to force string content format for GLM models - Tool response content was being dropped because vLLM detected 'openai' content format incorrectly for GLM templates - Added _is_glm_model() detection to force 'string' format - Updated Dockerfile to include hf.py patch - Added debug tests for tool visibility 2026-04-09 05:20:47 +00:00			\| `glm4_moe_tool_parser.py` \| Fixed tool parser (regex fix) \|
GLM-5.1 tool parser with incremental streaming support 2026-04-08 18:23:12 +00:00			\| `utils.py` \| Utility functions for partial JSON/tag handling \|
Add hf.py patch to force string content format for GLM models - Tool response content was being dropped because vLLM detected 'openai' content format incorrectly for GLM templates - Added _is_glm_model() detection to force 'string' format - Updated Dockerfile to include hf.py patch - Added debug tests for tool visibility 2026-04-09 05:20:47 +00:00			\| `vllm_patches/hf.py` \| Patched renderer (content format fix) \|
GLM-5.1 tool parser with incremental streaming support 2026-04-08 18:23:12 +00:00			\| `Dockerfile` \| Overlays patched files onto base image \|
			\| `Jenkinsfile` \| CI/CD pipeline for building and pushing \|
patch parser 2026-04-09 04:28:22 +00:00			\| `tests/` \| Test suite for tool call validation \|

			`## Testing`

			`### Requirements`

			```bash
			`pip install httpx regex`
			```

			`### Running Tests`

			```bash
			`export VLLM_API_BASE="https://api.vultrinference.com/v1"`
			`export VLLM_API_KEY="your-api-key"`
			`export VLLM_MODEL="zai-org/GLM-5.1-FP8"`

			`python tests/test_tool_diagnosis.py`
			```

			`### Test Cases`

			`\| Test \| Description \|`
			`\|------\|-------------\|`
			\| `test_simple_tool_response` \| Verifies model can see tool response content \|
			\| `test_without_tools_param` \| Tests behavior without tools param in follow-up \|
			\| `test_different_content_formats` \| String vs array content formats \|
GLM-5.1 tool parser with incremental streaming support 2026-04-08 18:23:12 +00:00
			`## Deployment`

			`### Jenkins Pipeline`

			```bash
			`curl -X POST "https://jenkins.sweetapi.com/job/vllm-glm-build/buildWithParameters" \`
			`-u "admin:TOKEN" \`
			`-d "IMAGE_TAG=latest"`
			```

			`### Manual Build`

			```bash
			`docker build -t atl.vultrcr.com/vllm/vllm-glm51-patched:latest .`
			`docker push atl.vultrcr.com/vllm/vllm-glm51-patched:latest`
			```

			`### Images`

			- Base: `vllm/vllm-openai:glm51-cu130`
			- Output: `atl.vultrcr.com/vllm/vllm-glm51-patched:<tag>`

			`## Related`

			`- vLLM Issue #32829 (streaming long string parameters)`
patch parser 2026-04-09 04:28:22 +00:00			`- GLM-5.1 chat template: https://huggingface.co/zai-org/GLM-5.1-FP8/raw/main/chat_template.jinja`