Files
vllm-glm/README.md

108 lines
3.4 KiB
Markdown
Raw Normal View History

# vLLM GLM Tool Parser Patch
2026-04-09 04:28:22 +00:00
Patches vLLM's GLM-4/GLM-5.1 tool parser to fix multiple issues with tool call handling.
2026-04-09 04:28:22 +00:00
## Issues Fixed
2026-04-09 04:28:22 +00:00
### Issue 1: Tool Response Content Ignored (CRITICAL)
2026-04-09 04:28:22 +00:00
**Symptom:** When the model makes a tool call and receives a response, it would act as if the response was empty ("The function returned no output") even though valid content was provided.
**Root Cause:** Two bugs working together:
1. **Tool parser regex mismatch** (`glm4_moe_tool_parser.py`): The `func_detail_regex` required a newline between the function name and first argument tag, but GLM-5.1's chat template doesn't include that newline. The regex silently failed to match.
2. **Content format detection wrong** (`vllm/renderers/hf.py`): vLLM detected "openai" content format because the GLM template has `{% for tr in m.content %}` for tool responses. But the template then checks `m.content is string` which is False for OpenAI format arrays, causing content to be dropped.
2026-04-09 04:28:22 +00:00
**Model output format (no newline after name):**
```
[TOOL_CALL_START]function_name[ARG_KEY]value[ARG_END]...[TOOL_CALL_END]
```
**Old regex (broken):**
```python
r"\[TOOL_CALL_START\]([^\n]*)\n(.*)\[TOOL_CALL_END\]" # Requires \n after name
```
2026-04-09 04:28:22 +00:00
**Fixed regex:**
```python
r"\[TOOL_CALL_START\]\s*([\w.\-]+)\s*((?:\[ARG_KEY\].*)?)\s*\[TOOL_CALL_END\]"
```
**Content format fix:**
Added `_is_glm_model()` detection to force "string" content format for GLM models, bypassing the incorrect auto-detection.
2026-04-09 04:28:22 +00:00
### Issue 2: Zero-Argument Tool Calls Crash
2026-04-09 04:28:22 +00:00
**Symptom:** `TypeError: 'NoneType' object is not iterable` when tool has no arguments.
2026-04-09 04:28:22 +00:00
**Fix:** The `tc_args_raw` is now defaulted to empty string: `tc_args_raw = tc_detail.group(2) or ""`
2026-04-09 04:28:22 +00:00
### Issue 3: Streaming Path vs Non-Streaming Path Inconsistency
Both paths now use the same robust extraction helpers for consistency.
## Files
| File | Description |
|------|-------------|
| `glm4_moe_tool_parser.py` | Fixed tool parser (regex fix) |
| `utils.py` | Utility functions for partial JSON/tag handling |
| `vllm_patches/hf.py` | Patched renderer (content format fix) |
| `Dockerfile` | Overlays patched files onto base image |
| `Jenkinsfile` | CI/CD pipeline for building and pushing |
2026-04-09 04:28:22 +00:00
| `tests/` | Test suite for tool call validation |
## Testing
### Requirements
```bash
pip install httpx regex
```
### Running Tests
```bash
export VLLM_API_BASE="https://api.vultrinference.com/v1"
export VLLM_API_KEY="your-api-key"
export VLLM_MODEL="zai-org/GLM-5.1-FP8"
python tests/test_tool_diagnosis.py
```
### Test Cases
| Test | Description |
|------|-------------|
| `test_simple_tool_response` | Verifies model can see tool response content |
| `test_without_tools_param` | Tests behavior without tools param in follow-up |
| `test_different_content_formats` | String vs array content formats |
## Deployment
### Jenkins Pipeline
```bash
curl -X POST "https://jenkins.sweetapi.com/job/vllm-glm-build/buildWithParameters" \
-u "admin:TOKEN" \
-d "IMAGE_TAG=latest"
```
### Manual Build
```bash
docker build -t atl.vultrcr.com/vllm/vllm-glm51-patched:latest .
docker push atl.vultrcr.com/vllm/vllm-glm51-patched:latest
```
### Images
- Base: `vllm/vllm-openai:glm51-cu130`
- Output: `atl.vultrcr.com/vllm/vllm-glm51-patched:<tag>`
## Related
- vLLM Issue #32829 (streaming long string parameters)
2026-04-09 04:28:22 +00:00
- GLM-5.1 chat template: https://huggingface.co/zai-org/GLM-5.1-FP8/raw/main/chat_template.jinja