deepseek-v4-quant

Files

biondizzle db6beb5b76 Complete dequant script: handles INT4 experts, FP8 attention, FP8 shared experts

INT4 expert weights are packed 2-per-byte into int8 with float8_e8m0fnu
per-row 32-column block scales. Unpacking: lower nibble first, upper second.
Output dimensions are 2x the stored dimensions (e.g. [3072,3584] → [3072,7168]).

Also adds progress output with ETA per shard so screen sessions stay alive.

2026-05-08 01:39:50 +00:00

dequant_fp8_to_bf16.py

Complete dequant script: handles INT4 experts, FP8 attention, FP8 shared experts

2026-05-08 01:39:50 +00:00

model_opt_nvfp4_experts_only.py

Update nvfp4_experts_only to use dequantized BF16 model

2026-05-07 16:34:37 +00:00

run_modelopt_nvfp4.sh

Add ModelOpt NVFP4 pipeline: patch, run script, README

2026-05-07 07:22:54 +00:00

upcast_to_bf16.py

Add BF16 upcast script and Blackwell DeepGEMM patch

2026-05-07 14:25:30 +00:00