vLLM DeepSeek-V3.2 MTP Tool Parser
A robust tool call parser for DeepSeek-V3.2 DSML format, designed to handle multi-token deltas from MTP (Multi-Token Prediction) and EAGLE speculative decoding.
Overview
This project provides a drop-in replacement for the standard vLLM tool parser that is resilient to multi-token streaming. Instead of maintaining incremental state, it re-parses the entire current text on every call, finds all tool call regions, builds JSON arguments, and emits only the newly-added characters. This makes it robust against variable token arrival rates.
Features
- Re-parse-and-diff approach: Re-parses the entire text on every streaming call for correctness
- Multi-token delta support: Handles any number of tokens arriving per step
- Complete and partial tool call handling: Streams both complete and in-progress tool calls
- JSON argument construction: Builds proper JSON arguments from parameter tags
- Schema-aware type conversion: Converts parameter values according to tool schema
- Content extraction: Properly extracts non-tool-call text without swallowing or duplicating content
Installation
Prerequisites
- Docker
- Access to a vLLM-compatible environment
- Python 3.12+
Building the Docker Image
# Build the image
docker build -t vllm-deepseek-v32-mtp:v0.19.0 .
# Or use the provided Jenkins pipeline (see below)
Usage
As a Drop-in Replacement
The parser implements the same interface as the standard vLLM tool parser:
from vllm.tool_parsers.deepseekv32_tool_parser import DeepSeekV32ToolParser
parser = DeepSeekV32ToolParser(tokenizer, tools)
In Streaming Mode
The parser automatically handles streaming by:
- Re-scanning current text for content outside tool-call regions
- Finding all
<|DSML|invoke>regions (complete + partial) - Building JSON args for each and diffing against previous state
- Emitting only new content
Tool Call Format
The parser expects the DeepSeek-V3.2 DSML format:
<|DSML|function_calls>
<|DSML|invoke name="get_weather">
<|DSML|parameter name="location" string="true">杭州</|DSML|parameter>
<|DSML|parameter name="date" string="true">2024-01-16</|DSML|parameter>
</|DSML|invoke>
</|DSML|function_calls>
Jenkins Pipeline
The project includes a Jenkinsfile for CI/CD. The pipeline:
- Checks out the repository
- Builds the Docker image
- Pushes to the specified registry
Pipeline Parameters
IMAGE_TAG: Docker image tag (default:v0.19.0)GIT_REPO: Git repository URL (optional, uses workspace if empty)GIT_BRANCH: Git branch to build (default:master)
Environment Variables
REGISTRY:atl.vultrcr.com/vllmIMAGE_NAME:vllm-deepseek-v32-mtp
Credentials
The pipeline requires Docker registry credentials stored in Jenkins as ATL_VCR_VLLM.
Configuration
Jenkins Setup
- Create a new pipeline job named
vllm-deepseek-v32-mtp - Configure it to pull from:
https://sweetapi.com/biondizzle/vllm-deepseek-v32-mtp.git - Set up the
ATL_VCR_VLLMcredentials in Jenkins - Run the pipeline
Manual Build
# Set your registry credentials
export DOCKER_REGISTRY_USER=your_user
export DOCKER_REGISTRY_PASS=your_pass
# Build and push
docker build -t atl.vultrcr.com/vllm/vllm-deepseek-v32-mtp:v0.19.0 .
docker push atl.vultrcr.com/vllm/vllm-deepseek-v32-mtp:v0.19.0
Development
Testing
The parser includes comprehensive unit tests for:
- Content extraction with partial tag overlaps
- Invoke region detection (complete and incomplete)
- JSON argument construction
- Type conversion according to schema
- Streaming delta computation
Contributing
- Fork the repository
- Create a feature branch
- Implement your changes
- Add tests
- Submit a pull request
License
Apache 2.0 - See LICENSE for details.
Architecture
Key Components
_extract_content(): Extracts non-tool-call text while handling partial tag overlaps_extract_invoke_regions(): Finds both complete and incomplete invoke blocks_build_args_json_so_far(): Constructs JSON arguments incrementally_compute_args_diff(): Computes and emits only newly-added charactersextract_tool_calls_streaming(): Main entry point that orchestrates the re-parse-and-diff process
State Management
The parser maintains minimal state between calls:
_sent_content_idx: Position tracker for content extraction_tool_call_ids: Generated IDs for each tool callstreamed_args_for_tool: Previously sent arguments for diffingprev_tool_call_arr: Previous tool call state
Troubleshooting
Common Issues
Tool calls not detected:
- Ensure the DSML tags are correctly formatted
- Verify
skip_special_tokens=Falsein the request - Check that the tool call format matches the expected pattern
Streaming hangs:
- Verify the closing tags are present in the model output
- Check for partial tag overlaps that might be causing the parser to wait
Type conversion errors:
- Ensure your tool schema defines the correct parameter types
- Verify that string parameters are marked with
string="true"
Support
For issues and questions, please use the project's issue tracker.
Related Projects
Changelog
v0.19.0
- Initial release with re-parse-and-diff architecture
- Full support for DeepSeek-V3.2 DSML format
- Jenkins pipeline integration
- Docker build and deployment support
Roadmap
- Performance optimizations for very long tool calls
- Additional validation and error handling
- Support for more parameter types
- Integration with additional vLLM features