vllm-glm/README.md

# vLLM GLM Tool Parser Patch

Patches vLLM's GLM-4/GLM-5.1 tool parser to fix multiple issues with tool call handling.

## Issues Fixed

### Issue 1: Tool Response Content Ignored (CRITICAL)

**Symptom:** When the model makes a tool call and receives a response, it would act as if the response was empty ("The function returned no output") even though valid content was provided.

**Root Cause:** The `func_detail_regex` required a newline between the function name and first argument tag, but GLM-5.1's chat template does NOT include that newline. The regex silently failed to match, tool call extraction failed, and somewhere in that failure path the tool response content got lost.

**Model output format (no newline after name):**
```
[TOOL_CALL_START]function_name[ARG_KEY]value[ARG_END]...[TOOL_CALL_END]
```

**Old regex (broken):**
```python
r"\[TOOL_CALL_START\]([^\n]*)\n(.*)\[TOOL_CALL_END\]"  # Requires \n after name
```

**Fixed regex:**
```python
r"\[TOOL_CALL_START\]\s*([\w.\-]+)\s*((?:\[ARG_KEY\].*)?)\s*\[TOOL_CALL_END\]"
```

The fix:
- Uses `\s*` instead of mandatory `\n`
- Makes the arguments group optional for zero-argument calls
- Accepts word chars, dots, and hyphens in function names

### Issue 2: Zero-Argument Tool Calls Crash

**Symptom:** `TypeError: 'NoneType' object is not iterable` when tool has no arguments.

**Fix:** The `tc_args_raw` is now defaulted to empty string: `tc_args_raw = tc_detail.group(2) or ""`

### Issue 3: Streaming Path vs Non-Streaming Path Inconsistency

Both paths now use the same robust extraction helpers for consistency.

## Files

| File | Description |
|------|-------------|
| `glm4_moe_tool_parser.py` | Fixed tool parser |
| `utils.py` | Utility functions for partial JSON/tag handling |
| `Dockerfile` | Overlays patched files onto base image |
| `Jenkinsfile` | CI/CD pipeline for building and pushing |
| `tests/` | Test suite for tool call validation |

## Testing

### Requirements

```bash
pip install httpx regex
```

### Running Tests

```bash
export VLLM_API_BASE="https://api.vultrinference.com/v1"
export VLLM_API_KEY="your-api-key"
export VLLM_MODEL="zai-org/GLM-5.1-FP8"

python tests/test_tool_diagnosis.py
```

### Test Cases

| Test | Description |
|------|-------------|
| `test_simple_tool_response` | Verifies model can see tool response content |
| `test_without_tools_param` | Tests behavior without tools param in follow-up |
| `test_different_content_formats` | String vs array content formats |

## Deployment

### Jenkins Pipeline

```bash
curl -X POST "https://jenkins.sweetapi.com/job/vllm-glm-build/buildWithParameters" \
  -u "admin:TOKEN" \
  -d "IMAGE_TAG=latest"
```

### Manual Build

```bash
docker build -t atl.vultrcr.com/vllm/vllm-glm51-patched:latest .
docker push atl.vultrcr.com/vllm/vllm-glm51-patched:latest
```

### Images

- Base: `vllm/vllm-openai:glm51-cu130`
- Output: `atl.vultrcr.com/vllm/vllm-glm51-patched:<tag>`

## Related

- vLLM Issue #32829 (streaming long string parameters)
- GLM-5.1 chat template: https://huggingface.co/zai-org/GLM-5.1-FP8/raw/main/chat_template.jinja