-
9438af5a8c
Add commit hashes to run history table
biondizzle
2026-05-09 06:47:26 +00:00
-
d7593fc1dd
Update README: run history table, bug #1 already fixed, cost note, don't-repeat mistakes
biondizzle
2026-05-09 06:44:17 +00:00
-
6eaba26914
Defensive quantization: snapshot amax to CPU immediately after calibration
biondizzle
2026-05-09 06:31:08 +00:00
-
3907838409
Remove ModuleList patch (already fixed in modelopt 0.45), fix numbering
biondizzle
2026-05-09 06:10:18 +00:00
-
382c1d872f
Fix quant_module import path
biondizzle
2026-05-09 06:09:17 +00:00
-
9291165ba0
Fix imports: QUANT_CFG_CHOICES is in hf_ptq, not modelopt config
biondizzle
2026-05-09 06:08:35 +00:00
-
a0bacb3cf6
Replace shell wrapper with in-process quantize script
biondizzle
2026-05-09 06:07:22 +00:00
-
04304fdae6
Add export crash fix patches, update README with bug #5 (repr CUDA crash)
biondizzle
2026-05-08 23:28:32 +00:00
-
50348989b2
Clarify: V4 is NOT BF16, dequantize first
biondizzle
2026-05-08 17:31:35 +00:00
-
24e3b3745d
Pin modelopt and transformers versions in README
biondizzle
2026-05-08 17:23:10 +00:00
-
b08afea425
remove weird session dump crap
biondizzle
2026-05-08 17:21:18 +00:00
-
a2370006f7
Update README: document full pipeline, BF16 verification, calib 128 constraint
biondizzle
2026-05-08 17:17:48 +00:00
-
f1d21900ea
Remove upcast_to_bf16.py — superseded by dequant_fp8_to_bf16.py
biondizzle
2026-05-08 17:13:39 +00:00
-
ca9a4f5eaa
Purge OpenClaw session files, memory dumps, __pycache__. Update .gitignore
biondizzle
2026-05-08 17:09:59 +00:00
-
eeba101cc4
Cleanup: nuke dead scripts and stale docs, rewrite README for full NVFP4 pipeline
biondizzle
2026-05-08 17:02:07 +00:00
-
075da675dc
fix: update HF token, echo it at runtime, export both HF_TOKEN and HUGGING_FACE_HUB_TOKEN
biondizzle
2026-05-08 16:57:32 +00:00
-
36e1342270
nvfp4_full: pass HF_TOKEN env var for gated calibration dataset
biondizzle
2026-05-08 13:33:45 +00:00
-
3d38e1d5cd
nvfp4_full: drop calib to 128, gpu_max_mem to 0.7 for VRAM headroom
biondizzle
2026-05-08 06:24:45 +00:00
-
d0fc5338fe
model_opt_nvfp4_full: add use_seq_device_map, fix source for /bin/sh
biondizzle
2026-05-08 05:50:16 +00:00
-
b70a04696e
Add resume capability to dequant script (skip already-done shards)
biondizzle
2026-05-08 02:58:24 +00:00
-
f63eed5cfd
Purge INT4 references — expert weights are FP4 (E2M1), not INT4
biondizzle
2026-05-08 02:33:46 +00:00
-
f8533197f2
Fix: expert weights are FP4 (E2M1), not INT4 - verified with nibble analysis
biondizzle
2026-05-08 02:25:43 +00:00
-
b5d569218c
Add full nvfp4 quantization script + complete dequant script
biondizzle
2026-05-08 01:50:53 +00:00
-
db6beb5b76
Complete dequant script: handles INT4 experts, FP8 attention, FP8 shared experts
biondizzle
2026-05-08 01:39:50 +00:00
-
cbfc5a9afb
Update nvfp4_experts_only to use dequantized BF16 model
biondizzle
2026-05-07 16:34:37 +00:00
-
b5d14aa8b8
Add proper FP8→BF16 dequantization script
biondizzle
2026-05-07 15:45:46 +00:00
-
6008cf128d
Add model_opt_nvfp4_experts_only.py
biondizzle
2026-05-07 15:15:32 +00:00
-
a7664aee7d
Add BF16 upcast script and Blackwell DeepGEMM patch
biondizzle
2026-05-07 14:26:05 +00:00
-
7a3b81e833
Add BF16 upcast script and Blackwell DeepGEMM patch
biondizzle
2026-05-07 14:25:20 +00:00
-
ef89ceffbd
Add ModelOpt NVFP4 pipeline: patch, run script, README
biondizzle
2026-05-07 07:22:54 +00:00
-
a0bcabac5a
NVFP4-everything: quantize all 2D Linear weights including attention and lm_head
master
biondizzle
2026-05-07 03:38:02 +00:00
-
116933dcf6
Fix: skip .cuda() when low_memory_mode; switch default to nvfp4
nvidia-modelopt
biondizzle
2026-05-07 03:06:33 +00:00
-
b8bdd00d19
Lower GPU max_memory to 100GiB, add CPU-only fallback for low_memory_mode
biondizzle
2026-05-07 02:49:24 +00:00
-
717151b98c
Add CPU offloading and max_memory caps for FP8 model loading
biondizzle
2026-05-07 02:40:48 +00:00
-
aff12c6951
Fix forward_loop: pass as callable, not via create_forward_loop
biondizzle
2026-05-07 02:08:09 +00:00
-
492e44c0f6
Fix dataloader API: max_sample_length not seq_len, proper create_forward_loop
biondizzle
2026-05-07 02:04:54 +00:00
-
b32bb2e84d
NVIDIA Model Optimizer branch: nvfp4_experts_only PTQ for DeepSeek V4 Pro
biondizzle
2026-05-07 00:11:31 +00:00
-
-
c40607053b
Fix remaining gate_proj/up_proj -> w1/w3 references in paired_names
biondizzle
2026-05-07 00:05:55 +00:00
-
771e42cef3
Fix expert pair dict keys: w1/w3 not gate_proj/up_proj
biondizzle
2026-05-07 00:05:25 +00:00
-
5f35a5d2b3
Gracefully handle missing scale tensors (BF16 weights with stale index entries)
biondizzle
2026-05-07 00:04:29 +00:00
-
4470653e15
Fix V4 tensor naming: .scale companions, w1/w3 expert pairs, ffn.gate, hc_* preserve
biondizzle
2026-05-07 00:03:20 +00:00
-
2b7f063e39
7 commit
biondizzle
2026-05-06 23:51:54 +00:00
-
be16bd023e
sixth commit
biondizzle
2026-05-06 23:50:51 +00:00
-
97e7638abc
sixth commit
biondizzle
2026-05-06 23:49:34 +00:00
-
75503a1190
fifth commit
biondizzle
2026-05-06 23:49:02 +00:00
-
2eeeefcf8f
fourth commit
biondizzle
2026-05-06 23:48:38 +00:00
-
31a4302ab6
third commit
biondizzle
2026-05-06 23:48:25 +00:00
-
18ba8e057f
second commit
biondizzle
2026-05-06 23:47:38 +00:00
-
4708cdebb2
init commit
biondizzle
2026-05-06 23:47:07 +00:00