README.md

# vLLM GLM-5.x Tool Calling Patches

Fixes two critical bugs that prevent GLM models from working correctly with OpenAI-compatible tool calling in vLLM.

## Summary

GLM-5.x models would either crash or silently drop tool response content when using the OpenAI chat completions API with tools. Two separate bugs were responsible:

1. **Tool parser regex mismatch** — Parser expected newline between function name and arguments, but GLM's template does not include one
2. **Content format detection failure** — vLLM auto-detected "openai" format incorrectly, causing tool response content to be dropped

---

## Bug #1: Tool Parser Regex Mismatch

### Problem

The `func_detail_regex` in `glm4_moe_tool_parser.py` required a literal newline between the function name and the first argument tag.

GLM-5.x chat template outputs tool calls without that newline - the function name is immediately followed by the first argument tag. The regex would fail to match, causing tool call extraction to fail silently.

### Fix

Changed the regex to use `\\s*` (optional whitespace) instead of mandatory `\\n`, and made the arguments group optional for zero-argument calls:

```python
# Before
r"\[TOOL_START\]([^\n]*)\n(.*)\[TOOL_END\]"

# After  
r"\[TOOL_START\]\s*([\w.\-]+)\s*((?:\[ARG_KEY\].*)?)\s*\[TOOL_END\]"
```

Also fixed `tc_args_raw` to default to empty string, preventing crashes on zero-argument tool calls.

**File:** `glm4_moe_tool_parser.py`

---

## Bug #2: Content Format Detection Failure

### Problem

vLLM's `_detect_content_format()` function analyzes Jinja templates to determine whether message content should be formatted as strings or OpenAI-style arrays.

For GLM-5.x, the template contains a loop `{% for tr in m.content %}` for handling tool responses with multiple results. vLLM saw this loop and detected "openai" format, converting tool message content to:

```json
[{"type": "text", "text": "the actual content"}]
```

However, the GLM template's first branch checks `{% if m.content is string %}` before using that loop. Since arrays are not strings, the template took the wrong branch and the content was lost.

The model would respond: *"The function returned no output"* even though valid content was provided.

### Root Cause

The template has two branches for tool messages:

```jinja
{%- if m.content is string %}
    {{ '<observations>' + m.content + '</observations>' }}
{%- else %}
    {% for tr in m.content %}  <!-- expects objects with .name property -->
    ...
{% endif %}
```

vLLM's detection saw the `for` loop and chose "openai" format. But the `is string` check failed for arrays, and the `else` branch expected objects with `.name` properties that `{"type": "text"}` objects don't have.

### Fix

Added `_is_glm_model()` detection function to `vllm/renderers/hf.py` that forces "string" content format for GLM models, bypassing the incorrect auto-detection:

```python
def _is_glm_model(tokenizer: HfTokenizer, model_config: "ModelConfig") -> bool:
    """Check if this is a GLM model that requires string content format."""
    name_or_path = tokenizer.name_or_path.lower()
    glm_indicators = ["glm-4", "glm-5", "glm4", "glm5", "zai-org/glm"]
    return any(ind in name_or_path for ind in glm_indicators)
```

Called in `_resolve_chat_template_content_format()` before auto-detection.

**File:** `vllm_patches/hf.py`

---

## Files

| File | Description |
|------|-------------|
| `glm4_moe_tool_parser.py` | Fixed tool parser (regex fix) |
| `utils.py` | Utility functions for partial JSON/tag handling |
| `vllm_patches/hf.py` | Patched renderer (content format fix) |
| `Dockerfile` | Overlays patched files onto base vLLM image |

---

## Deployment

### Docker Build

```bash
docker build -t your-registry/vllm-glm51-patched:latest .
docker push your-registry/vllm-glm51-patched:latest
```

### Kubernetes

Update your deployment to use the patched image and ensure these vLLM args:

```yaml
extraArgs:
  - "--tool-call-parser=glm47"
  - "--enable-auto-tool-choice"
```

---

## Verification

Tool response content is now properly passed to the model:

```
Model response: The test function was called successfully! It returned the value **42**.
PASS: Model referenced the tool result (42)
```

---

## Related

- vLLM Issue #32829 (streaming long string parameters)
- GLM-5.1 chat template: https://huggingface.co/zai-org/GLM-5.1-FP8/raw/main/chat_template.jinja
Clean up README with full bug analysis for ZAI 2026-04-09 06:21:04 +00:00			`# vLLM GLM-5.x Tool Calling Patches`
GLM-5.1 tool parser with incremental streaming support 2026-04-08 18:23:12 +00:00
Clean up README with full bug analysis for ZAI 2026-04-09 06:21:04 +00:00			`Fixes two critical bugs that prevent GLM models from working correctly with OpenAI-compatible tool calling in vLLM.`
GLM-5.1 tool parser with incremental streaming support 2026-04-08 18:23:12 +00:00
Clean up README with full bug analysis for ZAI 2026-04-09 06:21:04 +00:00			`## Summary`
GLM-5.1 tool parser with incremental streaming support 2026-04-08 18:23:12 +00:00
Clean up README with full bug analysis for ZAI 2026-04-09 06:21:04 +00:00			`GLM-5.x models would either crash or silently drop tool response content when using the OpenAI chat completions API with tools. Two separate bugs were responsible:`
GLM-5.1 tool parser with incremental streaming support 2026-04-08 18:23:12 +00:00
Clean up README with full bug analysis for ZAI 2026-04-09 06:21:04 +00:00			`1. Tool parser regex mismatch — Parser expected newline between function name and arguments, but GLM's template does not include one`
			`2. Content format detection failure — vLLM auto-detected "openai" format incorrectly, causing tool response content to be dropped`
GLM-5.1 tool parser with incremental streaming support 2026-04-08 18:23:12 +00:00
Clean up README with full bug analysis for ZAI 2026-04-09 06:21:04 +00:00			`---`
Add hf.py patch to force string content format for GLM models - Tool response content was being dropped because vLLM detected 'openai' content format incorrectly for GLM templates - Added _is_glm_model() detection to force 'string' format - Updated Dockerfile to include hf.py patch - Added debug tests for tool visibility 2026-04-09 05:20:47 +00:00
Clean up README with full bug analysis for ZAI 2026-04-09 06:21:04 +00:00			`## Bug #1: Tool Parser Regex Mismatch`
Add hf.py patch to force string content format for GLM models - Tool response content was being dropped because vLLM detected 'openai' content format incorrectly for GLM templates - Added _is_glm_model() detection to force 'string' format - Updated Dockerfile to include hf.py patch - Added debug tests for tool visibility 2026-04-09 05:20:47 +00:00
Clean up README with full bug analysis for ZAI 2026-04-09 06:21:04 +00:00			`### Problem`
patch parser 2026-04-09 04:28:22 +00:00
Clean up README with full bug analysis for ZAI 2026-04-09 06:21:04 +00:00			The `func_detail_regex` in `glm4_moe_tool_parser.py` required a literal newline between the function name and the first argument tag.

			`GLM-5.x chat template outputs tool calls without that newline - the function name is immediately followed by the first argument tag. The regex would fail to match, causing tool call extraction to fail silently.`

			`### Fix`

			Changed the regex to use `\\s*` (optional whitespace) instead of mandatory `\\n`, and made the arguments group optional for zero-argument calls:
patch parser 2026-04-09 04:28:22 +00:00
			```python
Clean up README with full bug analysis for ZAI 2026-04-09 06:21:04 +00:00			`# Before`
			`r"\[TOOL_START\]([^\n])\n(.)\[TOOL_END\]"`

			`# After`
			`r"\[TOOL_START\]\s([\w.\-]+)\s((?:\[ARG_KEY\].)?)\s\[TOOL_END\]"`
GLM-5.1 tool parser with incremental streaming support 2026-04-08 18:23:12 +00:00			```
patch parser 2026-04-09 04:28:22 +00:00
Clean up README with full bug analysis for ZAI 2026-04-09 06:21:04 +00:00			Also fixed `tc_args_raw` to default to empty string, preventing crashes on zero-argument tool calls.

			File: `glm4_moe_tool_parser.py`

			`---`

			`## Bug #2: Content Format Detection Failure`

			`### Problem`

			vLLM's `_detect_content_format()` function analyzes Jinja templates to determine whether message content should be formatted as strings or OpenAI-style arrays.

			For GLM-5.x, the template contains a loop `{% for tr in m.content %}` for handling tool responses with multiple results. vLLM saw this loop and detected "openai" format, converting tool message content to:

			```json
			`[{"type": "text", "text": "the actual content"}]`
			```

			However, the GLM template's first branch checks `{% if m.content is string %}` before using that loop. Since arrays are not strings, the template took the wrong branch and the content was lost.

			`The model would respond: "The function returned no output" even though valid content was provided.`

			`### Root Cause`

			`The template has two branches for tool messages:`

			```jinja
			`{%- if m.content is string %}`
			`{{ '<observations>' + m.content + '</observations>' }}`
			`{%- else %}`
			`{% for tr in m.content %} <!-- expects objects with .name property -->`
			`...`
			`{% endif %}`
GLM-5.1 tool parser with incremental streaming support 2026-04-08 18:23:12 +00:00			```

Clean up README with full bug analysis for ZAI 2026-04-09 06:21:04 +00:00			vLLM's detection saw the `for` loop and chose "openai" format. But the `is string` check failed for arrays, and the `else` branch expected objects with `.name` properties that `{"type": "text"}` objects don't have.
patch parser 2026-04-09 04:28:22 +00:00
Clean up README with full bug analysis for ZAI 2026-04-09 06:21:04 +00:00			`### Fix`
GLM-5.1 tool parser with incremental streaming support 2026-04-08 18:23:12 +00:00
Clean up README with full bug analysis for ZAI 2026-04-09 06:21:04 +00:00			Added `_is_glm_model()` detection function to `vllm/renderers/hf.py` that forces "string" content format for GLM models, bypassing the incorrect auto-detection:
GLM-5.1 tool parser with incremental streaming support 2026-04-08 18:23:12 +00:00
Clean up README with full bug analysis for ZAI 2026-04-09 06:21:04 +00:00			```python
			`def _is_glm_model(tokenizer: HfTokenizer, model_config: "ModelConfig") -> bool:`
			`"""Check if this is a GLM model that requires string content format."""`
			`name_or_path = tokenizer.name_or_path.lower()`
			`glm_indicators = ["glm-4", "glm-5", "glm4", "glm5", "zai-org/glm"]`
			`return any(ind in name_or_path for ind in glm_indicators)`
			```
GLM-5.1 tool parser with incremental streaming support 2026-04-08 18:23:12 +00:00
Clean up README with full bug analysis for ZAI 2026-04-09 06:21:04 +00:00			Called in `_resolve_chat_template_content_format()` before auto-detection.
patch parser 2026-04-09 04:28:22 +00:00
Clean up README with full bug analysis for ZAI 2026-04-09 06:21:04 +00:00			File: `vllm_patches/hf.py`

			`---`
GLM-5.1 tool parser with incremental streaming support 2026-04-08 18:23:12 +00:00
			`## Files`

			`\| File \| Description \|`
			`\|------\|-------------\|`
Add hf.py patch to force string content format for GLM models - Tool response content was being dropped because vLLM detected 'openai' content format incorrectly for GLM templates - Added _is_glm_model() detection to force 'string' format - Updated Dockerfile to include hf.py patch - Added debug tests for tool visibility 2026-04-09 05:20:47 +00:00			\| `glm4_moe_tool_parser.py` \| Fixed tool parser (regex fix) \|
GLM-5.1 tool parser with incremental streaming support 2026-04-08 18:23:12 +00:00			\| `utils.py` \| Utility functions for partial JSON/tag handling \|
Add hf.py patch to force string content format for GLM models - Tool response content was being dropped because vLLM detected 'openai' content format incorrectly for GLM templates - Added _is_glm_model() detection to force 'string' format - Updated Dockerfile to include hf.py patch - Added debug tests for tool visibility 2026-04-09 05:20:47 +00:00			\| `vllm_patches/hf.py` \| Patched renderer (content format fix) \|
Clean up README with full bug analysis for ZAI 2026-04-09 06:21:04 +00:00			\| `Dockerfile` \| Overlays patched files onto base vLLM image \|

			`---`
patch parser 2026-04-09 04:28:22 +00:00
Clean up README with full bug analysis for ZAI 2026-04-09 06:21:04 +00:00			`## Deployment`
patch parser 2026-04-09 04:28:22 +00:00
Clean up README with full bug analysis for ZAI 2026-04-09 06:21:04 +00:00			`### Docker Build`
patch parser 2026-04-09 04:28:22 +00:00
			```bash
Clean up README with full bug analysis for ZAI 2026-04-09 06:21:04 +00:00			`docker build -t your-registry/vllm-glm51-patched:latest .`
			`docker push your-registry/vllm-glm51-patched:latest`
patch parser 2026-04-09 04:28:22 +00:00			```

Clean up README with full bug analysis for ZAI 2026-04-09 06:21:04 +00:00			`### Kubernetes`
patch parser 2026-04-09 04:28:22 +00:00
Clean up README with full bug analysis for ZAI 2026-04-09 06:21:04 +00:00			`Update your deployment to use the patched image and ensure these vLLM args:`
patch parser 2026-04-09 04:28:22 +00:00
Clean up README with full bug analysis for ZAI 2026-04-09 06:21:04 +00:00			```yaml
			`extraArgs:`
			`- "--tool-call-parser=glm47"`
			`- "--enable-auto-tool-choice"`
patch parser 2026-04-09 04:28:22 +00:00			```

Clean up README with full bug analysis for ZAI 2026-04-09 06:21:04 +00:00			`---`
patch parser 2026-04-09 04:28:22 +00:00
Clean up README with full bug analysis for ZAI 2026-04-09 06:21:04 +00:00			`## Verification`
GLM-5.1 tool parser with incremental streaming support 2026-04-08 18:23:12 +00:00
Clean up README with full bug analysis for ZAI 2026-04-09 06:21:04 +00:00			`Tool response content is now properly passed to the model:`
GLM-5.1 tool parser with incremental streaming support 2026-04-08 18:23:12 +00:00
			```
Clean up README with full bug analysis for ZAI 2026-04-09 06:21:04 +00:00			`Model response: The test function was called successfully! It returned the value 42.`
			`PASS: Model referenced the tool result (42)`
GLM-5.1 tool parser with incremental streaming support 2026-04-08 18:23:12 +00:00			```

Clean up README with full bug analysis for ZAI 2026-04-09 06:21:04 +00:00			`---`
GLM-5.1 tool parser with incremental streaming support 2026-04-08 18:23:12 +00:00
			`## Related`

			`- vLLM Issue #32829 (streaming long string parameters)`
patch parser 2026-04-09 04:28:22 +00:00			`- GLM-5.1 chat template: https://huggingface.co/zai-org/GLM-5.1-FP8/raw/main/chat_template.jinja`