# vLLM DeepSeek-V3.2 MTP Tool Parser A robust tool call parser for DeepSeek-V3.2 DSML format, designed to handle multi-token deltas from MTP (Multi-Token Prediction) and EAGLE speculative decoding. ## Overview This project provides a drop-in replacement for the standard vLLM tool parser that is resilient to multi-token streaming. Instead of maintaining incremental state, it re-parses the entire current text on every call, finds all tool call regions, builds JSON arguments, and emits only the newly-added characters. This makes it robust against variable token arrival rates. ## Features - **Re-parse-and-diff approach**: Re-parses the entire text on every streaming call for correctness - **Multi-token delta support**: Handles any number of tokens arriving per step - **Complete and partial tool call handling**: Streams both complete and in-progress tool calls - **JSON argument construction**: Builds proper JSON arguments from parameter tags - **Schema-aware type conversion**: Converts parameter values according to tool schema - **Content extraction**: Properly extracts non-tool-call text without swallowing or duplicating content ## Installation ### Prerequisites - Docker - Access to a vLLM-compatible environment - Python 3.12+ ### Building the Docker Image ```bash # Build the image docker build -t vllm-deepseek-v32-mtp:v0.19.0 . # Or use the provided Jenkins pipeline (see below) ``` ## Usage ### As a Drop-in Replacement The parser implements the same interface as the standard vLLM tool parser: ```python from vllm.tool_parsers.deepseekv32_tool_parser import DeepSeekV32ToolParser parser = DeepSeekV32ToolParser(tokenizer, tools) ``` ### In Streaming Mode The parser automatically handles streaming by: 1. Re-scanning current text for content outside tool-call regions 2. Finding all `<|DSML|invoke>` regions (complete + partial) 3. Building JSON args for each and diffing against previous state 4. Emitting only new content ## Tool Call Format The parser expects the DeepSeek-V3.2 DSML format: ``` <|DSML|function_calls> <|DSML|invoke name="get_weather"> <|DSML|parameter name="location" string="true">杭州 <|DSML|parameter name="date" string="true">2024-01-16 ``` ## Jenkins Pipeline The project includes a Jenkinsfile for CI/CD. The pipeline: 1. Checks out the repository 2. Builds the Docker image 3. Pushes to the specified registry ### Pipeline Parameters - `IMAGE_TAG`: Docker image tag (default: `v0.19.0`) - `GIT_REPO`: Git repository URL (optional, uses workspace if empty) - `GIT_BRANCH`: Git branch to build (default: `master`) ### Environment Variables - `REGISTRY`: `atl.vultrcr.com/vllm` - `IMAGE_NAME`: `vllm-deepseek-v32-mtp` ### Credentials The pipeline requires Docker registry credentials stored in Jenkins as `ATL_VCR_VLLM`. ## Configuration ### Jenkins Setup 1. Create a new pipeline job named `vllm-deepseek-v32-mtp` 2. Configure it to pull from: `https://sweetapi.com/biondizzle/vllm-deepseek-v32-mtp.git` 3. Set up the `ATL_VCR_VLLM` credentials in Jenkins 4. Run the pipeline ### Manual Build ```bash # Set your registry credentials export DOCKER_REGISTRY_USER=your_user export DOCKER_REGISTRY_PASS=your_pass # Build and push docker build -t atl.vultrcr.com/vllm/vllm-deepseek-v32-mtp:v0.19.0 . docker push atl.vultrcr.com/vllm/vllm-deepseek-v32-mtp:v0.19.0 ``` ## Development ### Testing The parser includes comprehensive unit tests for: - Content extraction with partial tag overlaps - Invoke region detection (complete and incomplete) - JSON argument construction - Type conversion according to schema - Streaming delta computation ### Contributing 1. Fork the repository 2. Create a feature branch 3. Implement your changes 4. Add tests 5. Submit a pull request ## License Apache 2.0 - See [LICENSE](LICENSE) for details. ## Architecture ### Key Components - **`_extract_content()`**: Extracts non-tool-call text while handling partial tag overlaps - **`_extract_invoke_regions()`**: Finds both complete and incomplete invoke blocks - **`_build_args_json_so_far()`**: Constructs JSON arguments incrementally - **`_compute_args_diff()`**: Computes and emits only newly-added characters - **`extract_tool_calls_streaming()`**: Main entry point that orchestrates the re-parse-and-diff process ### State Management The parser maintains minimal state between calls: - `_sent_content_idx`: Position tracker for content extraction - `_tool_call_ids`: Generated IDs for each tool call - `streamed_args_for_tool`: Previously sent arguments for diffing - `prev_tool_call_arr`: Previous tool call state ## Troubleshooting ### Common Issues **Tool calls not detected**: - Ensure the DSML tags are correctly formatted - Verify `skip_special_tokens=False` in the request - Check that the tool call format matches the expected pattern **Streaming hangs**: - Verify the closing tags are present in the model output - Check for partial tag overlaps that might be causing the parser to wait **Type conversion errors**: - Ensure your tool schema defines the correct parameter types - Verify that string parameters are marked with `string="true"` ## Support For issues and questions, please use the project's issue tracker. ## Related Projects - [vLLM](https://github.com/vllm-project/vllm): The main vLLM project - [DeepSeek](https://github.com/deepseek-ai): DeepSeek AI models - [MTP](https://github.com/vllm-project/vllm): Multi-Token Prediction implementation ## Changelog ### v0.19.0 - Initial release with re-parse-and-diff architecture - Full support for DeepSeek-V3.2 DSML format - Jenkins pipeline integration - Docker build and deployment support ## Roadmap - Performance optimizations for very long tool calls - Additional validation and error handling - Support for more parameter types - Integration with additional vLLM features