Files
vllm-glm/README.md
2026-04-09 04:28:22 +00:00

105 lines
3.0 KiB
Markdown

# vLLM GLM Tool Parser Patch
Patches vLLM's GLM-4/GLM-5.1 tool parser to fix multiple issues with tool call handling.
## Issues Fixed
### Issue 1: Tool Response Content Ignored (CRITICAL)
**Symptom:** When the model makes a tool call and receives a response, it would act as if the response was empty ("The function returned no output") even though valid content was provided.
**Root Cause:** The `func_detail_regex` required a newline between the function name and first argument tag, but GLM-5.1's chat template does NOT include that newline. The regex silently failed to match, tool call extraction failed, and somewhere in that failure path the tool response content got lost.
**Model output format (no newline after name):**
```
[TOOL_CALL_START]function_name[ARG_KEY]value[ARG_END]...[TOOL_CALL_END]
```
**Old regex (broken):**
```python
r"\[TOOL_CALL_START\]([^\n]*)\n(.*)\[TOOL_CALL_END\]" # Requires \n after name
```
**Fixed regex:**
```python
r"\[TOOL_CALL_START\]\s*([\w.\-]+)\s*((?:\[ARG_KEY\].*)?)\s*\[TOOL_CALL_END\]"
```
The fix:
- Uses `\s*` instead of mandatory `\n`
- Makes the arguments group optional for zero-argument calls
- Accepts word chars, dots, and hyphens in function names
### Issue 2: Zero-Argument Tool Calls Crash
**Symptom:** `TypeError: 'NoneType' object is not iterable` when tool has no arguments.
**Fix:** The `tc_args_raw` is now defaulted to empty string: `tc_args_raw = tc_detail.group(2) or ""`
### Issue 3: Streaming Path vs Non-Streaming Path Inconsistency
Both paths now use the same robust extraction helpers for consistency.
## Files
| File | Description |
|------|-------------|
| `glm4_moe_tool_parser.py` | Fixed tool parser |
| `utils.py` | Utility functions for partial JSON/tag handling |
| `Dockerfile` | Overlays patched files onto base image |
| `Jenkinsfile` | CI/CD pipeline for building and pushing |
| `tests/` | Test suite for tool call validation |
## Testing
### Requirements
```bash
pip install httpx regex
```
### Running Tests
```bash
export VLLM_API_BASE="https://api.vultrinference.com/v1"
export VLLM_API_KEY="your-api-key"
export VLLM_MODEL="zai-org/GLM-5.1-FP8"
python tests/test_tool_diagnosis.py
```
### Test Cases
| Test | Description |
|------|-------------|
| `test_simple_tool_response` | Verifies model can see tool response content |
| `test_without_tools_param` | Tests behavior without tools param in follow-up |
| `test_different_content_formats` | String vs array content formats |
## Deployment
### Jenkins Pipeline
```bash
curl -X POST "https://jenkins.sweetapi.com/job/vllm-glm-build/buildWithParameters" \
-u "admin:TOKEN" \
-d "IMAGE_TAG=latest"
```
### Manual Build
```bash
docker build -t atl.vultrcr.com/vllm/vllm-glm51-patched:latest .
docker push atl.vultrcr.com/vllm/vllm-glm51-patched:latest
```
### Images
- Base: `vllm/vllm-openai:glm51-cu130`
- Output: `atl.vultrcr.com/vllm/vllm-glm51-patched:<tag>`
## Related
- vLLM Issue #32829 (streaming long string parameters)
- GLM-5.1 chat template: https://huggingface.co/zai-org/GLM-5.1-FP8/raw/main/chat_template.jinja