Files

biondizzle d245658d29 Add README and update Jenkinsfile with repository URL

2026-04-13 23:46:48 +00:00

5.8 KiB

Raw Permalink Blame History

vLLM DeepSeek-V3.2 MTP Tool Parser

A robust tool call parser for DeepSeek-V3.2 DSML format, designed to handle multi-token deltas from MTP (Multi-Token Prediction) and EAGLE speculative decoding.

Overview

This project provides a drop-in replacement for the standard vLLM tool parser that is resilient to multi-token streaming. Instead of maintaining incremental state, it re-parses the entire current text on every call, finds all tool call regions, builds JSON arguments, and emits only the newly-added characters. This makes it robust against variable token arrival rates.

Features

Re-parse-and-diff approach: Re-parses the entire text on every streaming call for correctness
Multi-token delta support: Handles any number of tokens arriving per step
Complete and partial tool call handling: Streams both complete and in-progress tool calls
JSON argument construction: Builds proper JSON arguments from parameter tags
Schema-aware type conversion: Converts parameter values according to tool schema
Content extraction: Properly extracts non-tool-call text without swallowing or duplicating content

Installation

Prerequisites

Docker
Access to a vLLM-compatible environment
Python 3.12+

Building the Docker Image

# Build the image
docker build -t vllm-deepseek-v32-mtp:v0.19.0 .

# Or use the provided Jenkins pipeline (see below)

Usage

As a Drop-in Replacement

The parser implements the same interface as the standard vLLM tool parser:

from vllm.tool_parsers.deepseekv32_tool_parser import DeepSeekV32ToolParser

parser = DeepSeekV32ToolParser(tokenizer, tools)

In Streaming Mode

The parser automatically handles streaming by:

Re-scanning current text for content outside tool-call regions
Finding all <｜DSML｜invoke> regions (complete + partial)
Building JSON args for each and diffing against previous state
Emitting only new content

Tool Call Format

The parser expects the DeepSeek-V3.2 DSML format:

<｜DSML｜function_calls>
<｜DSML｜invoke name="get_weather">
<｜DSML｜parameter name="location" string="true">杭州</｜DSML｜parameter>
<｜DSML｜parameter name="date" string="true">2024-01-16</｜DSML｜parameter>
</｜DSML｜invoke>
</｜DSML｜function_calls>

Jenkins Pipeline

The project includes a Jenkinsfile for CI/CD. The pipeline:

Checks out the repository
Builds the Docker image
Pushes to the specified registry

Pipeline Parameters

IMAGE_TAG: Docker image tag (default: v0.19.0)
GIT_REPO: Git repository URL (optional, uses workspace if empty)
GIT_BRANCH: Git branch to build (default: master)

Environment Variables

REGISTRY: atl.vultrcr.com/vllm
IMAGE_NAME: vllm-deepseek-v32-mtp

Credentials

The pipeline requires Docker registry credentials stored in Jenkins as ATL_VCR_VLLM.

Configuration

Jenkins Setup

Create a new pipeline job named vllm-deepseek-v32-mtp
Configure it to pull from: https://sweetapi.com/biondizzle/vllm-deepseek-v32-mtp.git
Set up the ATL_VCR_VLLM credentials in Jenkins
Run the pipeline

Manual Build

# Set your registry credentials
export DOCKER_REGISTRY_USER=your_user
export DOCKER_REGISTRY_PASS=your_pass

# Build and push
docker build -t atl.vultrcr.com/vllm/vllm-deepseek-v32-mtp:v0.19.0 .
docker push atl.vultrcr.com/vllm/vllm-deepseek-v32-mtp:v0.19.0

Development

Testing

The parser includes comprehensive unit tests for:

Content extraction with partial tag overlaps
Invoke region detection (complete and incomplete)
JSON argument construction
Type conversion according to schema
Streaming delta computation

Contributing

Fork the repository
Create a feature branch
Implement your changes
Add tests
Submit a pull request

License

Apache 2.0 - See LICENSE for details.

Architecture

Key Components

_extract_content(): Extracts non-tool-call text while handling partial tag overlaps
_extract_invoke_regions(): Finds both complete and incomplete invoke blocks
_build_args_json_so_far(): Constructs JSON arguments incrementally
_compute_args_diff(): Computes and emits only newly-added characters
extract_tool_calls_streaming(): Main entry point that orchestrates the re-parse-and-diff process

State Management

The parser maintains minimal state between calls:

_sent_content_idx: Position tracker for content extraction
_tool_call_ids: Generated IDs for each tool call
streamed_args_for_tool: Previously sent arguments for diffing
prev_tool_call_arr: Previous tool call state

Troubleshooting

Common Issues

Tool calls not detected:

Ensure the DSML tags are correctly formatted
Verify skip_special_tokens=False in the request
Check that the tool call format matches the expected pattern

Streaming hangs:

Verify the closing tags are present in the model output
Check for partial tag overlaps that might be causing the parser to wait

Type conversion errors:

Ensure your tool schema defines the correct parameter types
Verify that string parameters are marked with string="true"

Support

For issues and questions, please use the project's issue tracker.

vLLM: The main vLLM project
DeepSeek: DeepSeek AI models
MTP: Multi-Token Prediction implementation

Changelog

v0.19.0

Initial release with re-parse-and-diff architecture
Full support for DeepSeek-V3.2 DSML format
Jenkins pipeline integration
Docker build and deployment support

Roadmap

Performance optimizations for very long tool calls
Additional validation and error handling
Support for more parameter types
Integration with additional vLLM features

5.8 KiB Raw Permalink Blame History Unescape Escape