Files

biondizzle ef89ceffbd Add ModelOpt NVFP4 pipeline: patch, run script, README

- Patch fixes iter_weights_for_calibration() for DeepseekV4Experts
  (ModuleList quantizers vs singular)
- Run script uses official NVIDIA hf_ptq.py with FP8 source
- Documents flags to avoid (--low_memory_mode, wrong arg names)

2026-05-07 07:22:54 +00:00

1.4 KiB

Raw Blame History

DeepSeek V4 Pro NVFP4 via NVIDIA ModelOpt

What this does

Quantizes DeepSeek V4 Pro (FP8 weights) to full NVFP4 format using NVIDIA's official ModelOpt pipeline. Target output: ~600GB (vs 840GB from custom Path A converter).

Prerequisites

B200 node (8× B200, 2.7TB RAM) — NVFP4 requires Blackwell GPUs
modelopt 0.45.0+ from git
transformers 5.8.0.dev0 (for DeepSeekV4 support)
kernels package (for FP8 dequantization during calibration)

Critical Patch

modelopt has a bug with DeepSeekV4Experts — the iter_weights_for_calibration() method doesn't handle ModuleList quantizers (plural gate_up_proj_weight_quantizers). Apply the patch before running:

cp patches/quant_module_patched.py <venv-path>/lib/python3.10/site-packages/modelopt/torch/quantization/nn/modules/quant_module.py

Do NOT use these flags

--low_memory_mode: causes meta device error with V4
--calib_size: wrong arg name (use --calib)

Run

bash scripts/run_modelopt_nvfp4.sh

Output

/root/nvidia-meeting/modelopt-repo/examples/llm_ptq/saved_models_DeepSeek-V4-Pro-FP8_nvfp4_kv_fp8_cast

Notes

Use FP8 source (DeepSeek-V4-Pro-FP8), NOT mixed-precision BF16 (DeepSeek-V4-Pro)
V4's mixed precision causes "wonky shit" — FP8 is clean
Calibration takes hours with CPU offload (--use_seq_device_map)
Expected calibration time: several hours for 256 samples

1.4 KiB Raw Blame History Unescape Escape