deepseek-v4-quant

Files

biondizzle 6eaba26914 Defensive quantization: snapshot amax to CPU immediately after calibration

Key changes:
- snapshot_amax_to_cpu(): copies all quantizer _amax to CPU and saves
  to disk (~50MB) right after mtq.quantize() returns, before any other
  GPU operation can corrupt them
- force_all_amax_to_cpu(): nuclear option, moves _pre_quant_scale and
  _global_amax to CPU too
- _FORCE_AMAX_CPU flag + patched amax setter: after calibration, any
  future amax writes go to CPU instead of GPU
- --validate-only mode to check saved state without running anything
- restore_amax_from_snapshot() for --export-only recovery
- torch.cuda.empty_cache() + gc.collect() between steps
- Patches: export_amax CPU fallback, get_activation_scaling_factor
  clamp instead of assert

2026-05-09 06:31:08 +00:00

dequant_fp8_to_bf16.py

Add resume capability to dequant script (skip already-done shards)

2026-05-08 02:58:24 +00:00

quantize_nvfp4.py

Defensive quantization: snapshot amax to CPU immediately after calibration

2026-05-09 06:31:08 +00:00