deepseek-v4-quant

Author	SHA1	Message	Date
biondizzle	6eaba26914	Defensive quantization: snapshot amax to CPU immediately after calibration Key changes: - snapshot_amax_to_cpu(): copies all quantizer _amax to CPU and saves to disk (~50MB) right after mtq.quantize() returns, before any other GPU operation can corrupt them - force_all_amax_to_cpu(): nuclear option, moves _pre_quant_scale and _global_amax to CPU too - _FORCE_AMAX_CPU flag + patched amax setter: after calibration, any future amax writes go to CPU instead of GPU - --validate-only mode to check saved state without running anything - restore_amax_from_snapshot() for --export-only recovery - torch.cuda.empty_cache() + gc.collect() between steps - Patches: export_amax CPU fallback, get_activation_scaling_factor clamp instead of assert	2026-05-09 06:31:08 +00:00
biondizzle	3907838409	Remove ModuleList patch (already fixed in modelopt 0.45), fix numbering	2026-05-09 06:10:18 +00:00
biondizzle	382c1d872f	Fix quant_module import path	2026-05-09 06:09:17 +00:00
biondizzle	9291165ba0	Fix imports: QUANT_CFG_CHOICES is in hf_ptq, not modelopt config	2026-05-09 06:08:35 +00:00
biondizzle	a0bacb3cf6	Replace shell wrapper with in-process quantize script - New scripts/quantize_nvfp4.py: runs full ModelOpt pipeline in-process - Saves calibrated state after calibration (insurance against export crashes) - Patches modelopt for V4: ModuleList quantizers, stale GPU tensor safety - --export-only flag to retry export from saved calibration state - Removed old model_opt_nvfp4_full.py (shell wrapper) - Updated README with new pipeline docs and bug #5/#6	2026-05-09 06:07:22 +00:00

5 Commits