Commit Graph

11 Commits

Author SHA1 Message Date
99f861f48a Update README and memory: Run 3 OOM crash, Run 4 running on f9bbef8
- Added Run 3 to table (model loading OOM, fixed with get_model())
- Added Run 4 (current, commit f9bbef8)
- Added bug #7 (model loading OOM during expert weight concat)
- Added 'do NOT repeat' for AutoModelForCausalLM.from_pretrained
- Documented all 5 runtime patches
- Noted only divergence from modelopt example: get_model()
2026-05-09 08:10:04 +00:00
9438af5a8c Add commit hashes to run history table 2026-05-09 06:47:26 +00:00
d7593fc1dd Update README: run history table, bug #1 already fixed, cost note, don't-repeat mistakes 2026-05-09 06:44:17 +00:00
a0bacb3cf6 Replace shell wrapper with in-process quantize script
- New scripts/quantize_nvfp4.py: runs full ModelOpt pipeline in-process
- Saves calibrated state after calibration (insurance against export crashes)
- Patches modelopt for V4: ModuleList quantizers, stale GPU tensor safety
- --export-only flag to retry export from saved calibration state
- Removed old model_opt_nvfp4_full.py (shell wrapper)
- Updated README with new pipeline docs and bug #5/#6
2026-05-09 06:07:22 +00:00
04304fdae6 Add export crash fix patches, update README with bug #5 (repr CUDA crash) 2026-05-08 23:28:32 +00:00
50348989b2 Clarify: V4 is NOT BF16, dequantize first 2026-05-08 17:31:35 +00:00
24e3b3745d Pin modelopt and transformers versions in README 2026-05-08 17:23:10 +00:00
a2370006f7 Update README: document full pipeline, BF16 verification, calib 128 constraint 2026-05-08 17:17:48 +00:00
eeba101cc4 Cleanup: nuke dead scripts and stale docs, rewrite README for full NVFP4 pipeline 2026-05-08 17:02:07 +00:00
b32bb2e84d NVIDIA Model Optimizer branch: nvfp4_experts_only PTQ for DeepSeek V4 Pro 2026-05-07 00:11:31 +00:00
4708cdebb2 init commit 2026-05-06 23:47:07 +00:00