68 lines
1.8 KiB
Markdown
68 lines
1.8 KiB
Markdown
# vLLM GLM Tool Parser Patch
|
|
|
|
## Purpose
|
|
|
|
Patches vLLM's GLM-4/GLM-5.1 tool parser to fix a streaming issue where long string parameters are buffered entirely before being emitted, causing multi-second delays.
|
|
|
|
## The Problem
|
|
|
|
GLM models emit tool calls in a special XML-like format:
|
|
|
|
```
|
|
.tool_name
|
|
param_nameparam_value
|
|
```
|
|
|
|
The upstream parser (as of vLLM issue #32829) buffers string values until the closing tag arrives. For long strings (e.g., 4000+ characters of code), users see nothing until the entire value is complete — not true streaming.
|
|
|
|
## The Fix
|
|
|
|
`glm4_moe_tool_parser.py` implements incremental string streaming:
|
|
|
|
- Re-parses `` regions on each streaming call
|
|
- Diffs against previously sent content
|
|
- Emits only new characters as they arrive
|
|
- String values now stream character-by-character
|
|
|
|
## Files
|
|
|
|
| File | Description |
|
|
|------|-------------|
|
|
| `glm4_moe_tool_parser.py` | Fixed tool parser with incremental streaming |
|
|
| `utils.py` | Utility functions for partial JSON/tag handling |
|
|
| `Dockerfile` | Overlays patched files onto base image |
|
|
| `Jenkinsfile` | CI/CD pipeline for building and pushing |
|
|
|
|
## Deployment
|
|
|
|
### Jenkins Pipeline
|
|
|
|
Build via Jenkins:
|
|
|
|
```bash
|
|
curl -X POST "https://jenkins.sweetapi.com/job/vllm-glm-build/buildWithParameters" \
|
|
-u "admin:TOKEN" \
|
|
-d "IMAGE_TAG=latest"
|
|
```
|
|
|
|
Parameters:
|
|
- `IMAGE_TAG` - Docker image tag (default: `latest`)
|
|
- `GIT_REPO` - Git repository URL (optional, uses workspace if empty)
|
|
- `GIT_BRANCH` - Git branch to build (default: `master`)
|
|
|
|
### Manual Build
|
|
|
|
```bash
|
|
docker build -t atl.vultrcr.com/vllm/vllm-glm51-patched:latest .
|
|
docker push atl.vultrcr.com/vllm/vllm-glm51-patched:latest
|
|
```
|
|
|
|
### Images
|
|
|
|
- Base: `vllm/vllm-openai:glm51-cu130`
|
|
- Output: `atl.vultrcr.com/vllm/vllm-glm51-patched:<tag>`
|
|
|
|
## Related
|
|
|
|
- vLLM Issue #32829 (streaming long string parameters)
|