Files
vllm-deepseek-v32-mtp/README.md

193 lines
5.8 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# vLLM DeepSeek-V3.2 MTP Tool Parser
A robust tool call parser for DeepSeek-V3.2 DSML format, designed to handle multi-token deltas from MTP (Multi-Token Prediction) and EAGLE speculative decoding.
## Overview
This project provides a drop-in replacement for the standard vLLM tool parser that is resilient to multi-token streaming. Instead of maintaining incremental state, it re-parses the entire current text on every call, finds all tool call regions, builds JSON arguments, and emits only the newly-added characters. This makes it robust against variable token arrival rates.
## Features
- **Re-parse-and-diff approach**: Re-parses the entire text on every streaming call for correctness
- **Multi-token delta support**: Handles any number of tokens arriving per step
- **Complete and partial tool call handling**: Streams both complete and in-progress tool calls
- **JSON argument construction**: Builds proper JSON arguments from parameter tags
- **Schema-aware type conversion**: Converts parameter values according to tool schema
- **Content extraction**: Properly extracts non-tool-call text without swallowing or duplicating content
## Installation
### Prerequisites
- Docker
- Access to a vLLM-compatible environment
- Python 3.12+
### Building the Docker Image
```bash
# Build the image
docker build -t vllm-deepseek-v32-mtp:v0.19.0 .
# Or use the provided Jenkins pipeline (see below)
```
## Usage
### As a Drop-in Replacement
The parser implements the same interface as the standard vLLM tool parser:
```python
from vllm.tool_parsers.deepseekv32_tool_parser import DeepSeekV32ToolParser
parser = DeepSeekV32ToolParser(tokenizer, tools)
```
### In Streaming Mode
The parser automatically handles streaming by:
1. Re-scanning current text for content outside tool-call regions
2. Finding all `<DSMLinvoke>` regions (complete + partial)
3. Building JSON args for each and diffing against previous state
4. Emitting only new content
## Tool Call Format
The parser expects the DeepSeek-V3.2 DSML format:
```
<DSMLfunction_calls>
<DSMLinvoke name="get_weather">
<DSMLparameter name="location" string="true">杭州</DSMLparameter>
<DSMLparameter name="date" string="true">2024-01-16</DSMLparameter>
</DSMLinvoke>
</DSMLfunction_calls>
```
## Jenkins Pipeline
The project includes a Jenkinsfile for CI/CD. The pipeline:
1. Checks out the repository
2. Builds the Docker image
3. Pushes to the specified registry
### Pipeline Parameters
- `IMAGE_TAG`: Docker image tag (default: `v0.19.0`)
- `GIT_REPO`: Git repository URL (optional, uses workspace if empty)
- `GIT_BRANCH`: Git branch to build (default: `master`)
### Environment Variables
- `REGISTRY`: `atl.vultrcr.com/vllm`
- `IMAGE_NAME`: `vllm-deepseek-v32-mtp`
### Credentials
The pipeline requires Docker registry credentials stored in Jenkins as `ATL_VCR_VLLM`.
## Configuration
### Jenkins Setup
1. Create a new pipeline job named `vllm-deepseek-v32-mtp`
2. Configure it to pull from: `https://sweetapi.com/biondizzle/vllm-deepseek-v32-mtp.git`
3. Set up the `ATL_VCR_VLLM` credentials in Jenkins
4. Run the pipeline
### Manual Build
```bash
# Set your registry credentials
export DOCKER_REGISTRY_USER=your_user
export DOCKER_REGISTRY_PASS=your_pass
# Build and push
docker build -t atl.vultrcr.com/vllm/vllm-deepseek-v32-mtp:v0.19.0 .
docker push atl.vultrcr.com/vllm/vllm-deepseek-v32-mtp:v0.19.0
```
## Development
### Testing
The parser includes comprehensive unit tests for:
- Content extraction with partial tag overlaps
- Invoke region detection (complete and incomplete)
- JSON argument construction
- Type conversion according to schema
- Streaming delta computation
### Contributing
1. Fork the repository
2. Create a feature branch
3. Implement your changes
4. Add tests
5. Submit a pull request
## License
Apache 2.0 - See [LICENSE](LICENSE) for details.
## Architecture
### Key Components
- **`_extract_content()`**: Extracts non-tool-call text while handling partial tag overlaps
- **`_extract_invoke_regions()`**: Finds both complete and incomplete invoke blocks
- **`_build_args_json_so_far()`**: Constructs JSON arguments incrementally
- **`_compute_args_diff()`**: Computes and emits only newly-added characters
- **`extract_tool_calls_streaming()`**: Main entry point that orchestrates the re-parse-and-diff process
### State Management
The parser maintains minimal state between calls:
- `_sent_content_idx`: Position tracker for content extraction
- `_tool_call_ids`: Generated IDs for each tool call
- `streamed_args_for_tool`: Previously sent arguments for diffing
- `prev_tool_call_arr`: Previous tool call state
## Troubleshooting
### Common Issues
**Tool calls not detected**:
- Ensure the DSML tags are correctly formatted
- Verify `skip_special_tokens=False` in the request
- Check that the tool call format matches the expected pattern
**Streaming hangs**:
- Verify the closing tags are present in the model output
- Check for partial tag overlaps that might be causing the parser to wait
**Type conversion errors**:
- Ensure your tool schema defines the correct parameter types
- Verify that string parameters are marked with `string="true"`
## Support
For issues and questions, please use the project's issue tracker.
## Related Projects
- [vLLM](https://github.com/vllm-project/vllm): The main vLLM project
- [DeepSeek](https://github.com/deepseek-ai): DeepSeek AI models
- [MTP](https://github.com/vllm-project/vllm): Multi-Token Prediction implementation
## Changelog
### v0.19.0
- Initial release with re-parse-and-diff architecture
- Full support for DeepSeek-V3.2 DSML format
- Jenkins pipeline integration
- Docker build and deployment support
## Roadmap
- Performance optimizations for very long tool calls
- Additional validation and error handling
- Support for more parameter types
- Integration with additional vLLM features