99f861f48a
Update README and memory: Run 3 OOM crash, Run 4 running on f9bbef8
...
- Added Run 3 to table (model loading OOM, fixed with get_model())
- Added Run 4 (current, commit f9bbef8 )
- Added bug #7 (model loading OOM during expert weight concat)
- Added 'do NOT repeat' for AutoModelForCausalLM.from_pretrained
- Documented all 5 runtime patches
- Noted only divergence from modelopt example: get_model()
2026-05-09 08:10:04 +00:00
9438af5a8c
Add commit hashes to run history table
2026-05-09 06:47:26 +00:00
d7593fc1dd
Update README: run history table, bug #1 already fixed, cost note, don't-repeat mistakes
2026-05-09 06:44:17 +00:00
a0bacb3cf6
Replace shell wrapper with in-process quantize script
...
- New scripts/quantize_nvfp4.py: runs full ModelOpt pipeline in-process
- Saves calibrated state after calibration (insurance against export crashes)
- Patches modelopt for V4: ModuleList quantizers, stale GPU tensor safety
- --export-only flag to retry export from saved calibration state
- Removed old model_opt_nvfp4_full.py (shell wrapper)
- Updated README with new pipeline docs and bug #5/#6
2026-05-09 06:07:22 +00:00
04304fdae6
Add export crash fix patches, update README with bug #5 (repr CUDA crash)
2026-05-08 23:28:32 +00:00
50348989b2
Clarify: V4 is NOT BF16, dequantize first
2026-05-08 17:31:35 +00:00
24e3b3745d
Pin modelopt and transformers versions in README
2026-05-08 17:23:10 +00:00
a2370006f7
Update README: document full pipeline, BF16 verification, calib 128 constraint
2026-05-08 17:17:48 +00:00
eeba101cc4
Cleanup: nuke dead scripts and stale docs, rewrite README for full NVFP4 pipeline
2026-05-08 17:02:07 +00:00
b32bb2e84d
NVIDIA Model Optimizer branch: nvfp4_experts_only PTQ for DeepSeek V4 Pro
2026-05-07 00:11:31 +00:00
4708cdebb2
init commit
2026-05-06 23:47:07 +00:00