deepseek-v4-quant

Author	SHA1	Message	Date
biondizzle	99f861f48a	Update README and memory: Run 3 OOM crash, Run 4 running on `f9bbef8` - Added Run 3 to table (model loading OOM, fixed with get_model()) - Added Run 4 (current, commit `f9bbef8`) - Added bug #7 (model loading OOM during expert weight concat) - Added 'do NOT repeat' for AutoModelForCausalLM.from_pretrained - Documented all 5 runtime patches - Noted only divergence from modelopt example: get_model()	2026-05-09 08:10:04 +00:00
biondizzle	9438af5a8c	Add commit hashes to run history table	2026-05-09 06:47:26 +00:00
biondizzle	d7593fc1dd	Update README: run history table, bug #1 already fixed, cost note, don't-repeat mistakes	2026-05-09 06:44:17 +00:00
biondizzle	a0bacb3cf6	Replace shell wrapper with in-process quantize script - New scripts/quantize_nvfp4.py: runs full ModelOpt pipeline in-process - Saves calibrated state after calibration (insurance against export crashes) - Patches modelopt for V4: ModuleList quantizers, stale GPU tensor safety - --export-only flag to retry export from saved calibration state - Removed old model_opt_nvfp4_full.py (shell wrapper) - Updated README with new pipeline docs and bug #5/#6	2026-05-09 06:07:22 +00:00
biondizzle	04304fdae6	Add export crash fix patches, update README with bug #5 (repr CUDA crash)	2026-05-08 23:28:32 +00:00
biondizzle	50348989b2	Clarify: V4 is NOT BF16, dequantize first	2026-05-08 17:31:35 +00:00
biondizzle	24e3b3745d	Pin modelopt and transformers versions in README	2026-05-08 17:23:10 +00:00
biondizzle	a2370006f7	Update README: document full pipeline, BF16 verification, calib 128 constraint	2026-05-08 17:17:48 +00:00
biondizzle	eeba101cc4	Cleanup: nuke dead scripts and stale docs, rewrite README for full NVFP4 pipeline	2026-05-08 17:02:07 +00:00
biondizzle	b32bb2e84d	NVIDIA Model Optimizer branch: nvfp4_experts_only PTQ for DeepSeek V4 Pro	2026-05-07 00:11:31 +00:00
biondizzle	4708cdebb2	init commit	2026-05-06 23:47:07 +00:00

11 Commits