Files
deepseek-v4-quant/MEMORY.md
biondizzle 02b8ea536f Update MEMORY.md and memory files with vLLM NVFP4 serving progress
Server running on B200 port 8000 with full NVFP4→vLLM bridge.
All critical bugs fixed: DeepGEMM scale format, compressor shapes, block scale values.
2026-05-11 02:02:49 +00:00

30 lines
1.5 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# MEMORY.md — Long-Term Memory
## Mike
- Working on DeepSeek V4 Pro NVFP4 quantization + vLLM serving on B200 node
- B200 node: 45.76.247.107, root, password in project .env
- Repo: https://sweetapi.com/biondizzle/deepseek-v4-quant.git (modelopt-nvfp4 branch)
## DeepSeek V4 NVFP4 Project
- Successfully quantized: 881GB NVFP4 (Run 11), 8× B200, $161/run
- modelopt 0.45.0.dev64 + transformers 5.8.0.dev0
- **vLLM server running on B200 port 8000** as of May 11, 2026 🎉
- We built the entire NVFP4→vLLM bridge from scratch (NVIDIA hasn't done this)
- Abandoned mega_moe (no kernel, format mismatch), using standard FusedMoE instead
### Key Technical Decisions
- **wo_a**: NVFP4→BF16→FP8 with DeepGEMM block-scale format for BMM einsum
- **Attention layers**: NVFP4→BF16 dequantization, UnquantizedLinearMethod
- **Compressor**: Reconstructed fused_wkv_wgate from separate kv_proj+gate_proj in checkpoint
- **MoE experts**: Stay NVFP4, use FLASHINFER_TRTLLM FusedMoE backend
### Critical Bugs Fixed (May 11)
1. DeepGEMM `sf.dim()` crash: weight_scale_inv must be DeepGEMM-formatted block scale tensor
2. Compressor indexer shape mismatch: checkpoint keys have `.indexer.` sub-path
3. All-ones block scale → garbage output: must use `torch.full(..., fp8_scale)` not `torch.ones`
4. Block scale dtype: must be float32, not float8_e4m3fn
### Outstanding
- Output quality under investigation — FP4 is aggressive quantization
- All code in patches/deepseek_v4.py on modelopt-nvfp4 branch