deepseek-v4-quant

Files

biondizzle 03c10ab3b6 Fix model loading: use modelopt get_model() instead of raw AutoModelForCausalLM

Raw from_pretrained OOMs during weight conversion — torch.cat on expert
gate_up_proj tries to allocate 31.5GB on a GPU with only 25.9GB free.
modelopt's get_model() handles max_memory/device_map properly for models
that need sequential device mapping.

2026-05-09 08:00:50 +00:00

dequant_fp8_to_bf16.py

Add resume capability to dequant script (skip already-done shards)

2026-05-08 02:58:24 +00:00

quantize_nvfp4.py

Fix model loading: use modelopt get_model() instead of raw AutoModelForCausalLM

2026-05-09 08:00:50 +00:00