deepseek-v4-quant

Files

biondizzle f8533197f2 Fix: expert weights are FP4 (E2M1), not INT4 - verified with nibble analysis

Nibble index 0 vs 8 ratio = 0.996 (FP4 -0.0 ≈ +0.0), NOT INT4 where -8 would be rare.
FP4 dequant uses E2M1 LUT lookup × E8M0 scale (MXFP4 microscaling).
Also adds model_opt_nvfp4_full.py for full model NVFP4 quantization.

2026-05-08 02:25:43 +00:00

dequant_fp8_to_bf16.py

Fix: expert weights are FP4 (E2M1), not INT4 - verified with nibble analysis

2026-05-08 02:25:43 +00:00

model_opt_nvfp4_experts_only.py

Update nvfp4_experts_only to use dequantized BF16 model

2026-05-07 16:34:37 +00:00

model_opt_nvfp4_full.py

Add full nvfp4 quantization script + complete dequant script

2026-05-08 01:50:53 +00:00

run_modelopt_nvfp4.sh

Add ModelOpt NVFP4 pipeline: patch, run script, README

2026-05-07 07:22:54 +00:00

upcast_to_bf16.py

Add BF16 upcast script and Blackwell DeepGEMM patch

2026-05-07 14:25:30 +00:00