Commit Graph

5 Commits

Author SHA1 Message Date
36e1342270 nvfp4_full: pass HF_TOKEN env var for gated calibration dataset 2026-05-08 13:33:45 +00:00
3d38e1d5cd nvfp4_full: drop calib to 128, gpu_max_mem to 0.7 for VRAM headroom 2026-05-08 06:24:45 +00:00
d0fc5338fe model_opt_nvfp4_full: add use_seq_device_map, fix source for /bin/sh 2026-05-08 05:50:16 +00:00
f63eed5cfd Purge INT4 references — expert weights are FP4 (E2M1), not INT4
All docs and scripts updated. Historical memory entries annotated.
2026-05-08 02:33:46 +00:00
b5d569218c Add full nvfp4 quantization script + complete dequant script
- model_opt_nvfp4_full.py: Full NVFP4 quantization (not experts-only)
  Uses --gpu_max_mem_percentage 0.9 instead of --use_seq_device_map
- dequant_fp8_to_bf16.py: Now handles INT4-packed experts + FP8 shared
  experts + FP8 attention. Complete dequant to pure BF16.
2026-05-08 01:50:53 +00:00