# Reference Implementations This directory contains **read-only** reference implementations from official sources. Do not modify these files — they exist to cross-check our production pipeline. ## Directory Layout ``` reference/ ├── vllm/ # vLLM project reference (Apache-2.0) │ ├── tokenizers/ │ │ ├── deepseek_v4.py # Tokenizer wrapper — apply_chat_template for DSV4 │ │ └── deepseek_v4_encoding.py # Official prompt encoder (canonical source) │ ├── reasoning/ │ │ ├── deepseek_v3_reasoning_parser.py # Thinking-mode dispatcher │ │ └── deepseek_r1_reasoning_parser.py # )/) reasoning token parser │ └── tool_parsers/ │ ├── deepseekv4_tool_parser.py # DSML tool call parser (V4) │ └── deepseekv32_tool_parser.py # DSML tool call parser (V3.2 base) │ └── official_inference/ # Original weight's reference inference code ├── generate.py # Official generate loop + encode_messages usage ├── model.py # BF16/FP8 model implementation ├── kernel.py # Reference CUDA kernels ├── convert.py # Weight conversion └── config.json # Model config (small variant) ``` ## Key Files for Our Pipeline 1. **`vllm/tokenizers/deepseek_v4_encoding.py`** — Canonical prompt encoder. Already copied to `encoding/deepseek_v4_encoding.py` in the repo root (our live import). If vLLM updates this file, diff and sync. 2. **`vllm/tokenizers/deepseek_v4.py`** — Shows how vLLM wraps the tokenizer to add `apply_chat_template` support. Key insight: it calls `encode_messages(messages, thinking_mode=..., ...)` then `tokenizer.encode(prompt_str, add_special_tokens=False)`. This is exactly what our single_shot does. 3. **`official_inference/generate.py`** — The original weight's inference entry point. Uses `tokenizer.encode(encode_messages(messages, thinking_mode="chat"))` (default `add_special_tokens=True`) and `parse_message_from_completion_text()` for output parsing. 4. **`vllm/reasoning/`** — How vLLM detects thinking mode boundaries (`)、` start, `)/)` end). Useful when we integrate streaming. 5. **`vllm/tool_parsers/`** — DSML tool call parsing for future tool-use support.