deepseek-v4-quant

Author	SHA1	Message	Date
biondizzle	36e1342270	nvfp4_full: pass HF_TOKEN env var for gated calibration dataset	2026-05-08 13:33:45 +00:00
biondizzle	3d38e1d5cd	nvfp4_full: drop calib to 128, gpu_max_mem to 0.7 for VRAM headroom	2026-05-08 06:24:45 +00:00
biondizzle	d0fc5338fe	model_opt_nvfp4_full: add use_seq_device_map, fix source for /bin/sh	2026-05-08 05:50:16 +00:00
biondizzle	f63eed5cfd	Purge INT4 references — expert weights are FP4 (E2M1), not INT4 All docs and scripts updated. Historical memory entries annotated.	2026-05-08 02:33:46 +00:00
biondizzle	b5d569218c	Add full nvfp4 quantization script + complete dequant script - model_opt_nvfp4_full.py: Full NVFP4 quantization (not experts-only) Uses --gpu_max_mem_percentage 0.9 instead of --use_seq_device_map - dequant_fp8_to_bf16.py: Now handles INT4-packed experts + FP8 shared experts + FP8 attention. Complete dequant to pure BF16.	2026-05-08 01:50:53 +00:00