deepseek-v4-quant

Files

biondizzle b5d569218c Add full nvfp4 quantization script + complete dequant script

- model_opt_nvfp4_full.py: Full NVFP4 quantization (not experts-only)
  Uses --gpu_max_mem_percentage 0.9 instead of --use_seq_device_map
- dequant_fp8_to_bf16.py: Now handles INT4-packed experts + FP8 shared
  experts + FP8 attention. Complete dequant to pure BF16.

2026-05-08 01:50:53 +00:00

dequant_fp8_to_bf16.py

Complete dequant script: handles INT4 experts, FP8 attention, FP8 shared experts

2026-05-08 01:39:50 +00:00

model_opt_nvfp4_experts_only.py

Update nvfp4_experts_only to use dequantized BF16 model

2026-05-07 16:34:37 +00:00

model_opt_nvfp4_full.py

Add full nvfp4 quantization script + complete dequant script

2026-05-08 01:50:53 +00:00

run_modelopt_nvfp4.sh

Add ModelOpt NVFP4 pipeline: patch, run script, README

2026-05-07 07:22:54 +00:00

upcast_to_bf16.py

Add BF16 upcast script and Blackwell DeepGEMM patch

2026-05-07 14:25:30 +00:00