diff --git a/Jenkinsfile b/Jenkinsfile index a3b639f..74469ec 100644 --- a/Jenkinsfile +++ b/Jenkinsfile @@ -8,7 +8,7 @@ pipeline { parameters { string(name: 'IMAGE_TAG', defaultValue: 'v0.19.0', description: 'Docker image tag') - string(name: 'GIT_REPO', defaultValue: '', description: 'Git repository URL (optional, uses workspace if empty)') + string(name: 'GIT_REPO', defaultValue: 'https://sweetapi.com/biondizzle/vllm-deepseek-v32-mtp.git', description: 'Git repository URL (optional, uses workspace if empty)') string(name: 'GIT_BRANCH', defaultValue: 'master', description: 'Git branch to build') } diff --git a/README.md b/README.md new file mode 100644 index 0000000..86faaa4 --- /dev/null +++ b/README.md @@ -0,0 +1,192 @@ +# vLLM DeepSeek-V3.2 MTP Tool Parser + +A robust tool call parser for DeepSeek-V3.2 DSML format, designed to handle multi-token deltas from MTP (Multi-Token Prediction) and EAGLE speculative decoding. + +## Overview + +This project provides a drop-in replacement for the standard vLLM tool parser that is resilient to multi-token streaming. Instead of maintaining incremental state, it re-parses the entire current text on every call, finds all tool call regions, builds JSON arguments, and emits only the newly-added characters. This makes it robust against variable token arrival rates. + +## Features + +- **Re-parse-and-diff approach**: Re-parses the entire text on every streaming call for correctness +- **Multi-token delta support**: Handles any number of tokens arriving per step +- **Complete and partial tool call handling**: Streams both complete and in-progress tool calls +- **JSON argument construction**: Builds proper JSON arguments from parameter tags +- **Schema-aware type conversion**: Converts parameter values according to tool schema +- **Content extraction**: Properly extracts non-tool-call text without swallowing or duplicating content + +## Installation + +### Prerequisites +- Docker +- Access to a vLLM-compatible environment +- Python 3.12+ + +### Building the Docker Image + +```bash +# Build the image +docker build -t vllm-deepseek-v32-mtp:v0.19.0 . + +# Or use the provided Jenkins pipeline (see below) +``` + +## Usage + +### As a Drop-in Replacement + +The parser implements the same interface as the standard vLLM tool parser: + +```python +from vllm.tool_parsers.deepseekv32_tool_parser import DeepSeekV32ToolParser + +parser = DeepSeekV32ToolParser(tokenizer, tools) +``` + +### In Streaming Mode + +The parser automatically handles streaming by: +1. Re-scanning current text for content outside tool-call regions +2. Finding all `<|DSML|invoke>` regions (complete + partial) +3. Building JSON args for each and diffing against previous state +4. Emitting only new content + +## Tool Call Format + +The parser expects the DeepSeek-V3.2 DSML format: + +``` +<|DSML|function_calls> +<|DSML|invoke name="get_weather"> +<|DSML|parameter name="location" string="true">杭州 +<|DSML|parameter name="date" string="true">2024-01-16 + + +``` + +## Jenkins Pipeline + +The project includes a Jenkinsfile for CI/CD. The pipeline: + +1. Checks out the repository +2. Builds the Docker image +3. Pushes to the specified registry + +### Pipeline Parameters + +- `IMAGE_TAG`: Docker image tag (default: `v0.19.0`) +- `GIT_REPO`: Git repository URL (optional, uses workspace if empty) +- `GIT_BRANCH`: Git branch to build (default: `master`) + +### Environment Variables + +- `REGISTRY`: `atl.vultrcr.com/vllm` +- `IMAGE_NAME`: `vllm-deepseek-v32-mtp` + +### Credentials + +The pipeline requires Docker registry credentials stored in Jenkins as `ATL_VCR_VLLM`. + +## Configuration + +### Jenkins Setup + +1. Create a new pipeline job named `vllm-deepseek-v32-mtp` +2. Configure it to pull from: `https://sweetapi.com/biondizzle/vllm-deepseek-v32-mtp.git` +3. Set up the `ATL_VCR_VLLM` credentials in Jenkins +4. Run the pipeline + +### Manual Build + +```bash +# Set your registry credentials +export DOCKER_REGISTRY_USER=your_user +export DOCKER_REGISTRY_PASS=your_pass + +# Build and push +docker build -t atl.vultrcr.com/vllm/vllm-deepseek-v32-mtp:v0.19.0 . +docker push atl.vultrcr.com/vllm/vllm-deepseek-v32-mtp:v0.19.0 +``` + +## Development + +### Testing + +The parser includes comprehensive unit tests for: +- Content extraction with partial tag overlaps +- Invoke region detection (complete and incomplete) +- JSON argument construction +- Type conversion according to schema +- Streaming delta computation + +### Contributing + +1. Fork the repository +2. Create a feature branch +3. Implement your changes +4. Add tests +5. Submit a pull request + +## License + +Apache 2.0 - See [LICENSE](LICENSE) for details. + +## Architecture + +### Key Components + +- **`_extract_content()`**: Extracts non-tool-call text while handling partial tag overlaps +- **`_extract_invoke_regions()`**: Finds both complete and incomplete invoke blocks +- **`_build_args_json_so_far()`**: Constructs JSON arguments incrementally +- **`_compute_args_diff()`**: Computes and emits only newly-added characters +- **`extract_tool_calls_streaming()`**: Main entry point that orchestrates the re-parse-and-diff process + +### State Management + +The parser maintains minimal state between calls: +- `_sent_content_idx`: Position tracker for content extraction +- `_tool_call_ids`: Generated IDs for each tool call +- `streamed_args_for_tool`: Previously sent arguments for diffing +- `prev_tool_call_arr`: Previous tool call state + +## Troubleshooting + +### Common Issues + +**Tool calls not detected**: +- Ensure the DSML tags are correctly formatted +- Verify `skip_special_tokens=False` in the request +- Check that the tool call format matches the expected pattern + +**Streaming hangs**: +- Verify the closing tags are present in the model output +- Check for partial tag overlaps that might be causing the parser to wait + +**Type conversion errors**: +- Ensure your tool schema defines the correct parameter types +- Verify that string parameters are marked with `string="true"` + +## Support + +For issues and questions, please use the project's issue tracker. + +## Related Projects + +- [vLLM](https://github.com/vllm-project/vllm): The main vLLM project +- [DeepSeek](https://github.com/deepseek-ai): DeepSeek AI models +- [MTP](https://github.com/vllm-project/vllm): Multi-Token Prediction implementation + +## Changelog + +### v0.19.0 +- Initial release with re-parse-and-diff architecture +- Full support for DeepSeek-V3.2 DSML format +- Jenkins pipeline integration +- Docker build and deployment support + +## Roadmap + +- Performance optimizations for very long tool calls +- Additional validation and error handling +- Support for more parameter types +- Integration with additional vLLM features