Add README and update Jenkinsfile with repository URL
This commit is contained in:
2
Jenkinsfile
vendored
2
Jenkinsfile
vendored
@@ -8,7 +8,7 @@ pipeline {
|
||||
|
||||
parameters {
|
||||
string(name: 'IMAGE_TAG', defaultValue: 'v0.19.0', description: 'Docker image tag')
|
||||
string(name: 'GIT_REPO', defaultValue: '', description: 'Git repository URL (optional, uses workspace if empty)')
|
||||
string(name: 'GIT_REPO', defaultValue: 'https://sweetapi.com/biondizzle/vllm-deepseek-v32-mtp.git', description: 'Git repository URL (optional, uses workspace if empty)')
|
||||
string(name: 'GIT_BRANCH', defaultValue: 'master', description: 'Git branch to build')
|
||||
}
|
||||
|
||||
|
||||
192
README.md
Normal file
192
README.md
Normal file
@@ -0,0 +1,192 @@
|
||||
# vLLM DeepSeek-V3.2 MTP Tool Parser
|
||||
|
||||
A robust tool call parser for DeepSeek-V3.2 DSML format, designed to handle multi-token deltas from MTP (Multi-Token Prediction) and EAGLE speculative decoding.
|
||||
|
||||
## Overview
|
||||
|
||||
This project provides a drop-in replacement for the standard vLLM tool parser that is resilient to multi-token streaming. Instead of maintaining incremental state, it re-parses the entire current text on every call, finds all tool call regions, builds JSON arguments, and emits only the newly-added characters. This makes it robust against variable token arrival rates.
|
||||
|
||||
## Features
|
||||
|
||||
- **Re-parse-and-diff approach**: Re-parses the entire text on every streaming call for correctness
|
||||
- **Multi-token delta support**: Handles any number of tokens arriving per step
|
||||
- **Complete and partial tool call handling**: Streams both complete and in-progress tool calls
|
||||
- **JSON argument construction**: Builds proper JSON arguments from parameter tags
|
||||
- **Schema-aware type conversion**: Converts parameter values according to tool schema
|
||||
- **Content extraction**: Properly extracts non-tool-call text without swallowing or duplicating content
|
||||
|
||||
## Installation
|
||||
|
||||
### Prerequisites
|
||||
- Docker
|
||||
- Access to a vLLM-compatible environment
|
||||
- Python 3.12+
|
||||
|
||||
### Building the Docker Image
|
||||
|
||||
```bash
|
||||
# Build the image
|
||||
docker build -t vllm-deepseek-v32-mtp:v0.19.0 .
|
||||
|
||||
# Or use the provided Jenkins pipeline (see below)
|
||||
```
|
||||
|
||||
## Usage
|
||||
|
||||
### As a Drop-in Replacement
|
||||
|
||||
The parser implements the same interface as the standard vLLM tool parser:
|
||||
|
||||
```python
|
||||
from vllm.tool_parsers.deepseekv32_tool_parser import DeepSeekV32ToolParser
|
||||
|
||||
parser = DeepSeekV32ToolParser(tokenizer, tools)
|
||||
```
|
||||
|
||||
### In Streaming Mode
|
||||
|
||||
The parser automatically handles streaming by:
|
||||
1. Re-scanning current text for content outside tool-call regions
|
||||
2. Finding all `<|DSML|invoke>` regions (complete + partial)
|
||||
3. Building JSON args for each and diffing against previous state
|
||||
4. Emitting only new content
|
||||
|
||||
## Tool Call Format
|
||||
|
||||
The parser expects the DeepSeek-V3.2 DSML format:
|
||||
|
||||
```
|
||||
<|DSML|function_calls>
|
||||
<|DSML|invoke name="get_weather">
|
||||
<|DSML|parameter name="location" string="true">杭州</|DSML|parameter>
|
||||
<|DSML|parameter name="date" string="true">2024-01-16</|DSML|parameter>
|
||||
</|DSML|invoke>
|
||||
</|DSML|function_calls>
|
||||
```
|
||||
|
||||
## Jenkins Pipeline
|
||||
|
||||
The project includes a Jenkinsfile for CI/CD. The pipeline:
|
||||
|
||||
1. Checks out the repository
|
||||
2. Builds the Docker image
|
||||
3. Pushes to the specified registry
|
||||
|
||||
### Pipeline Parameters
|
||||
|
||||
- `IMAGE_TAG`: Docker image tag (default: `v0.19.0`)
|
||||
- `GIT_REPO`: Git repository URL (optional, uses workspace if empty)
|
||||
- `GIT_BRANCH`: Git branch to build (default: `master`)
|
||||
|
||||
### Environment Variables
|
||||
|
||||
- `REGISTRY`: `atl.vultrcr.com/vllm`
|
||||
- `IMAGE_NAME`: `vllm-deepseek-v32-mtp`
|
||||
|
||||
### Credentials
|
||||
|
||||
The pipeline requires Docker registry credentials stored in Jenkins as `ATL_VCR_VLLM`.
|
||||
|
||||
## Configuration
|
||||
|
||||
### Jenkins Setup
|
||||
|
||||
1. Create a new pipeline job named `vllm-deepseek-v32-mtp`
|
||||
2. Configure it to pull from: `https://sweetapi.com/biondizzle/vllm-deepseek-v32-mtp.git`
|
||||
3. Set up the `ATL_VCR_VLLM` credentials in Jenkins
|
||||
4. Run the pipeline
|
||||
|
||||
### Manual Build
|
||||
|
||||
```bash
|
||||
# Set your registry credentials
|
||||
export DOCKER_REGISTRY_USER=your_user
|
||||
export DOCKER_REGISTRY_PASS=your_pass
|
||||
|
||||
# Build and push
|
||||
docker build -t atl.vultrcr.com/vllm/vllm-deepseek-v32-mtp:v0.19.0 .
|
||||
docker push atl.vultrcr.com/vllm/vllm-deepseek-v32-mtp:v0.19.0
|
||||
```
|
||||
|
||||
## Development
|
||||
|
||||
### Testing
|
||||
|
||||
The parser includes comprehensive unit tests for:
|
||||
- Content extraction with partial tag overlaps
|
||||
- Invoke region detection (complete and incomplete)
|
||||
- JSON argument construction
|
||||
- Type conversion according to schema
|
||||
- Streaming delta computation
|
||||
|
||||
### Contributing
|
||||
|
||||
1. Fork the repository
|
||||
2. Create a feature branch
|
||||
3. Implement your changes
|
||||
4. Add tests
|
||||
5. Submit a pull request
|
||||
|
||||
## License
|
||||
|
||||
Apache 2.0 - See [LICENSE](LICENSE) for details.
|
||||
|
||||
## Architecture
|
||||
|
||||
### Key Components
|
||||
|
||||
- **`_extract_content()`**: Extracts non-tool-call text while handling partial tag overlaps
|
||||
- **`_extract_invoke_regions()`**: Finds both complete and incomplete invoke blocks
|
||||
- **`_build_args_json_so_far()`**: Constructs JSON arguments incrementally
|
||||
- **`_compute_args_diff()`**: Computes and emits only newly-added characters
|
||||
- **`extract_tool_calls_streaming()`**: Main entry point that orchestrates the re-parse-and-diff process
|
||||
|
||||
### State Management
|
||||
|
||||
The parser maintains minimal state between calls:
|
||||
- `_sent_content_idx`: Position tracker for content extraction
|
||||
- `_tool_call_ids`: Generated IDs for each tool call
|
||||
- `streamed_args_for_tool`: Previously sent arguments for diffing
|
||||
- `prev_tool_call_arr`: Previous tool call state
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Common Issues
|
||||
|
||||
**Tool calls not detected**:
|
||||
- Ensure the DSML tags are correctly formatted
|
||||
- Verify `skip_special_tokens=False` in the request
|
||||
- Check that the tool call format matches the expected pattern
|
||||
|
||||
**Streaming hangs**:
|
||||
- Verify the closing tags are present in the model output
|
||||
- Check for partial tag overlaps that might be causing the parser to wait
|
||||
|
||||
**Type conversion errors**:
|
||||
- Ensure your tool schema defines the correct parameter types
|
||||
- Verify that string parameters are marked with `string="true"`
|
||||
|
||||
## Support
|
||||
|
||||
For issues and questions, please use the project's issue tracker.
|
||||
|
||||
## Related Projects
|
||||
|
||||
- [vLLM](https://github.com/vllm-project/vllm): The main vLLM project
|
||||
- [DeepSeek](https://github.com/deepseek-ai): DeepSeek AI models
|
||||
- [MTP](https://github.com/vllm-project/vllm): Multi-Token Prediction implementation
|
||||
|
||||
## Changelog
|
||||
|
||||
### v0.19.0
|
||||
- Initial release with re-parse-and-diff architecture
|
||||
- Full support for DeepSeek-V3.2 DSML format
|
||||
- Jenkins pipeline integration
|
||||
- Docker build and deployment support
|
||||
|
||||
## Roadmap
|
||||
|
||||
- Performance optimizations for very long tool calls
|
||||
- Additional validation and error handling
|
||||
- Support for more parameter types
|
||||
- Integration with additional vLLM features
|
||||
Reference in New Issue
Block a user