Files
deepseek-v4-quant/README_modelopt_nvfp4.md
biondizzle ef89ceffbd Add ModelOpt NVFP4 pipeline: patch, run script, README
- Patch fixes iter_weights_for_calibration() for DeepseekV4Experts
  (ModuleList quantizers vs singular)
- Run script uses official NVIDIA hf_ptq.py with FP8 source
- Documents flags to avoid (--low_memory_mode, wrong arg names)
2026-05-07 07:22:54 +00:00

1.4 KiB
Raw Blame History

DeepSeek V4 Pro NVFP4 via NVIDIA ModelOpt

What this does

Quantizes DeepSeek V4 Pro (FP8 weights) to full NVFP4 format using NVIDIA's official ModelOpt pipeline. Target output: ~600GB (vs 840GB from custom Path A converter).

Prerequisites

  • B200 node (8× B200, 2.7TB RAM) — NVFP4 requires Blackwell GPUs
  • modelopt 0.45.0+ from git
  • transformers 5.8.0.dev0 (for DeepSeekV4 support)
  • kernels package (for FP8 dequantization during calibration)

Critical Patch

modelopt has a bug with DeepSeekV4Experts — the iter_weights_for_calibration() method doesn't handle ModuleList quantizers (plural gate_up_proj_weight_quantizers). Apply the patch before running:

cp patches/quant_module_patched.py <venv-path>/lib/python3.10/site-packages/modelopt/torch/quantization/nn/modules/quant_module.py

Do NOT use these flags

  • --low_memory_mode: causes meta device error with V4
  • --calib_size: wrong arg name (use --calib)

Run

bash scripts/run_modelopt_nvfp4.sh

Output

/root/nvidia-meeting/modelopt-repo/examples/llm_ptq/saved_models_DeepSeek-V4-Pro-FP8_nvfp4_kv_fp8_cast

Notes

  • Use FP8 source (DeepSeek-V4-Pro-FP8), NOT mixed-precision BF16 (DeepSeek-V4-Pro)
  • V4's mixed precision causes "wonky shit" — FP8 is clean
  • Calibration takes hours with CPU offload (--use_seq_device_map)
  • Expected calibration time: several hours for 256 samples