|
|
36e1342270
|
nvfp4_full: pass HF_TOKEN env var for gated calibration dataset
|
2026-05-08 13:33:45 +00:00 |
|
|
|
3d38e1d5cd
|
nvfp4_full: drop calib to 128, gpu_max_mem to 0.7 for VRAM headroom
|
2026-05-08 06:24:45 +00:00 |
|
|
|
d0fc5338fe
|
model_opt_nvfp4_full: add use_seq_device_map, fix source for /bin/sh
|
2026-05-08 05:50:16 +00:00 |
|
|
|
f63eed5cfd
|
Purge INT4 references — expert weights are FP4 (E2M1), not INT4
All docs and scripts updated. Historical memory entries annotated.
|
2026-05-08 02:33:46 +00:00 |
|
|
|
b5d569218c
|
Add full nvfp4 quantization script + complete dequant script
- model_opt_nvfp4_full.py: Full NVFP4 quantization (not experts-only)
Uses --gpu_max_mem_percentage 0.9 instead of --use_seq_device_map
- dequant_fp8_to_bf16.py: Now handles INT4-packed experts + FP8 shared
experts + FP8 attention. Complete dequant to pure BF16.
|
2026-05-08 01:50:53 +00:00 |
|