- Tool response content was being dropped because vLLM detected 'openai' content format incorrectly for GLM templates - Added _is_glm_model() detection to force 'string' format - Updated Dockerfile to include hf.py patch - Added debug tests for tool visibility
108 lines
3.4 KiB
Markdown
108 lines
3.4 KiB
Markdown
# vLLM GLM Tool Parser Patch
|
|
|
|
Patches vLLM's GLM-4/GLM-5.1 tool parser to fix multiple issues with tool call handling.
|
|
|
|
## Issues Fixed
|
|
|
|
### Issue 1: Tool Response Content Ignored (CRITICAL)
|
|
|
|
**Symptom:** When the model makes a tool call and receives a response, it would act as if the response was empty ("The function returned no output") even though valid content was provided.
|
|
|
|
**Root Cause:** Two bugs working together:
|
|
|
|
1. **Tool parser regex mismatch** (`glm4_moe_tool_parser.py`): The `func_detail_regex` required a newline between the function name and first argument tag, but GLM-5.1's chat template doesn't include that newline. The regex silently failed to match.
|
|
|
|
2. **Content format detection wrong** (`vllm/renderers/hf.py`): vLLM detected "openai" content format because the GLM template has `{% for tr in m.content %}` for tool responses. But the template then checks `m.content is string` which is False for OpenAI format arrays, causing content to be dropped.
|
|
|
|
**Model output format (no newline after name):**
|
|
```
|
|
[TOOL_CALL_START]function_name[ARG_KEY]value[ARG_END]...[TOOL_CALL_END]
|
|
```
|
|
|
|
**Old regex (broken):**
|
|
```python
|
|
r"\[TOOL_CALL_START\]([^\n]*)\n(.*)\[TOOL_CALL_END\]" # Requires \n after name
|
|
```
|
|
|
|
**Fixed regex:**
|
|
```python
|
|
r"\[TOOL_CALL_START\]\s*([\w.\-]+)\s*((?:\[ARG_KEY\].*)?)\s*\[TOOL_CALL_END\]"
|
|
```
|
|
|
|
**Content format fix:**
|
|
Added `_is_glm_model()` detection to force "string" content format for GLM models, bypassing the incorrect auto-detection.
|
|
|
|
### Issue 2: Zero-Argument Tool Calls Crash
|
|
|
|
**Symptom:** `TypeError: 'NoneType' object is not iterable` when tool has no arguments.
|
|
|
|
**Fix:** The `tc_args_raw` is now defaulted to empty string: `tc_args_raw = tc_detail.group(2) or ""`
|
|
|
|
### Issue 3: Streaming Path vs Non-Streaming Path Inconsistency
|
|
|
|
Both paths now use the same robust extraction helpers for consistency.
|
|
|
|
## Files
|
|
|
|
| File | Description |
|
|
|------|-------------|
|
|
| `glm4_moe_tool_parser.py` | Fixed tool parser (regex fix) |
|
|
| `utils.py` | Utility functions for partial JSON/tag handling |
|
|
| `vllm_patches/hf.py` | Patched renderer (content format fix) |
|
|
| `Dockerfile` | Overlays patched files onto base image |
|
|
| `Jenkinsfile` | CI/CD pipeline for building and pushing |
|
|
| `tests/` | Test suite for tool call validation |
|
|
|
|
## Testing
|
|
|
|
### Requirements
|
|
|
|
```bash
|
|
pip install httpx regex
|
|
```
|
|
|
|
### Running Tests
|
|
|
|
```bash
|
|
export VLLM_API_BASE="https://api.vultrinference.com/v1"
|
|
export VLLM_API_KEY="your-api-key"
|
|
export VLLM_MODEL="zai-org/GLM-5.1-FP8"
|
|
|
|
python tests/test_tool_diagnosis.py
|
|
```
|
|
|
|
### Test Cases
|
|
|
|
| Test | Description |
|
|
|------|-------------|
|
|
| `test_simple_tool_response` | Verifies model can see tool response content |
|
|
| `test_without_tools_param` | Tests behavior without tools param in follow-up |
|
|
| `test_different_content_formats` | String vs array content formats |
|
|
|
|
## Deployment
|
|
|
|
### Jenkins Pipeline
|
|
|
|
```bash
|
|
curl -X POST "https://jenkins.sweetapi.com/job/vllm-glm-build/buildWithParameters" \
|
|
-u "admin:TOKEN" \
|
|
-d "IMAGE_TAG=latest"
|
|
```
|
|
|
|
### Manual Build
|
|
|
|
```bash
|
|
docker build -t atl.vultrcr.com/vllm/vllm-glm51-patched:latest .
|
|
docker push atl.vultrcr.com/vllm/vllm-glm51-patched:latest
|
|
```
|
|
|
|
### Images
|
|
|
|
- Base: `vllm/vllm-openai:glm51-cu130`
|
|
- Output: `atl.vultrcr.com/vllm/vllm-glm51-patched:<tag>`
|
|
|
|
## Related
|
|
|
|
- vLLM Issue #32829 (streaming long string parameters)
|
|
- GLM-5.1 chat template: https://huggingface.co/zai-org/GLM-5.1-FP8/raw/main/chat_template.jinja
|