deepseek-v4-quant

biondizzle/deepseek-v4-quant

Fork 0

Commit Graph

Select branches

Hide Pull Requests

master

mega-moe-nvfp4

modelopt-nvfp4

nvidia-modelopt

9438af5a8c Add commit hashes to run history table biondizzle 2026-05-09 06:47:26 +00:00
d7593fc1dd Update README: run history table, bug #1 already fixed, cost note, don't-repeat mistakes biondizzle 2026-05-09 06:44:17 +00:00
6eaba26914 Defensive quantization: snapshot amax to CPU immediately after calibration biondizzle 2026-05-09 06:31:08 +00:00
3907838409 Remove ModuleList patch (already fixed in modelopt 0.45), fix numbering biondizzle 2026-05-09 06:10:18 +00:00
382c1d872f Fix quant_module import path biondizzle 2026-05-09 06:09:17 +00:00
9291165ba0 Fix imports: QUANT_CFG_CHOICES is in hf_ptq, not modelopt config biondizzle 2026-05-09 06:08:35 +00:00
a0bacb3cf6 Replace shell wrapper with in-process quantize script biondizzle 2026-05-09 06:07:22 +00:00
04304fdae6 Add export crash fix patches, update README with bug #5 (repr CUDA crash) biondizzle 2026-05-08 23:28:32 +00:00
50348989b2 Clarify: V4 is NOT BF16, dequantize first biondizzle 2026-05-08 17:31:35 +00:00
24e3b3745d Pin modelopt and transformers versions in README biondizzle 2026-05-08 17:23:10 +00:00
b08afea425 remove weird session dump crap biondizzle 2026-05-08 17:21:18 +00:00
a2370006f7 Update README: document full pipeline, BF16 verification, calib 128 constraint biondizzle 2026-05-08 17:17:48 +00:00
f1d21900ea Remove upcast_to_bf16.py — superseded by dequant_fp8_to_bf16.py biondizzle 2026-05-08 17:13:39 +00:00
ca9a4f5eaa Purge OpenClaw session files, memory dumps, __pycache__. Update .gitignore biondizzle 2026-05-08 17:09:59 +00:00
eeba101cc4 Cleanup: nuke dead scripts and stale docs, rewrite README for full NVFP4 pipeline biondizzle 2026-05-08 17:02:07 +00:00
075da675dc fix: update HF token, echo it at runtime, export both HF_TOKEN and HUGGING_FACE_HUB_TOKEN biondizzle 2026-05-08 16:57:32 +00:00
36e1342270 nvfp4_full: pass HF_TOKEN env var for gated calibration dataset biondizzle 2026-05-08 13:33:45 +00:00
3d38e1d5cd nvfp4_full: drop calib to 128, gpu_max_mem to 0.7 for VRAM headroom biondizzle 2026-05-08 06:24:45 +00:00
d0fc5338fe model_opt_nvfp4_full: add use_seq_device_map, fix source for /bin/sh biondizzle 2026-05-08 05:50:16 +00:00
b70a04696e Add resume capability to dequant script (skip already-done shards) biondizzle 2026-05-08 02:58:24 +00:00
f63eed5cfd Purge INT4 references — expert weights are FP4 (E2M1), not INT4 biondizzle 2026-05-08 02:33:46 +00:00
f8533197f2 Fix: expert weights are FP4 (E2M1), not INT4 - verified with nibble analysis biondizzle 2026-05-08 02:25:43 +00:00
b5d569218c Add full nvfp4 quantization script + complete dequant script biondizzle 2026-05-08 01:50:53 +00:00
db6beb5b76 Complete dequant script: handles INT4 experts, FP8 attention, FP8 shared experts biondizzle 2026-05-08 01:39:50 +00:00
cbfc5a9afb Update nvfp4_experts_only to use dequantized BF16 model biondizzle 2026-05-07 16:34:37 +00:00
b5d14aa8b8 Add proper FP8→BF16 dequantization script biondizzle 2026-05-07 15:45:46 +00:00
6008cf128d Add model_opt_nvfp4_experts_only.py biondizzle 2026-05-07 15:15:32 +00:00
a7664aee7d Add BF16 upcast script and Blackwell DeepGEMM patch biondizzle 2026-05-07 14:26:05 +00:00
7a3b81e833 Add BF16 upcast script and Blackwell DeepGEMM patch biondizzle 2026-05-07 14:25:20 +00:00
ef89ceffbd Add ModelOpt NVFP4 pipeline: patch, run script, README biondizzle 2026-05-07 07:22:54 +00:00
a0bcabac5a NVFP4-everything: quantize all 2D Linear weights including attention and lm_head master biondizzle 2026-05-07 03:38:02 +00:00
116933dcf6 Fix: skip .cuda() when low_memory_mode; switch default to nvfp4 nvidia-modelopt biondizzle 2026-05-07 03:06:33 +00:00
b8bdd00d19 Lower GPU max_memory to 100GiB, add CPU-only fallback for low_memory_mode biondizzle 2026-05-07 02:49:24 +00:00
717151b98c Add CPU offloading and max_memory caps for FP8 model loading biondizzle 2026-05-07 02:40:48 +00:00
aff12c6951 Fix forward_loop: pass as callable, not via create_forward_loop biondizzle 2026-05-07 02:08:09 +00:00
492e44c0f6 Fix dataloader API: max_sample_length not seq_len, proper create_forward_loop biondizzle 2026-05-07 02:04:54 +00:00
b32bb2e84d NVIDIA Model Optimizer branch: nvfp4_experts_only PTQ for DeepSeek V4 Pro biondizzle 2026-05-07 00:11:31 +00:00
c40607053b Fix remaining gate_proj/up_proj -> w1/w3 references in paired_names biondizzle 2026-05-07 00:05:55 +00:00
771e42cef3 Fix expert pair dict keys: w1/w3 not gate_proj/up_proj biondizzle 2026-05-07 00:05:25 +00:00
5f35a5d2b3 Gracefully handle missing scale tensors (BF16 weights with stale index entries) biondizzle 2026-05-07 00:04:29 +00:00
4470653e15 Fix V4 tensor naming: .scale companions, w1/w3 expert pairs, ffn.gate, hc_* preserve biondizzle 2026-05-07 00:03:20 +00:00
2b7f063e39 7 commit biondizzle 2026-05-06 23:51:54 +00:00
be16bd023e sixth commit biondizzle 2026-05-06 23:50:51 +00:00
97e7638abc sixth commit biondizzle 2026-05-06 23:49:34 +00:00
75503a1190 fifth commit biondizzle 2026-05-06 23:49:02 +00:00
2eeeefcf8f fourth commit biondizzle 2026-05-06 23:48:38 +00:00
31a4302ab6 third commit biondizzle 2026-05-06 23:48:25 +00:00
18ba8e057f second commit biondizzle 2026-05-06 23:47:38 +00:00
4708cdebb2 init commit biondizzle 2026-05-06 23:47:07 +00:00

1 2

Commit Graph Select branches Hide Pull Requests master mega-moe-nvfp4 modelopt-nvfp4 nvidia-modelopt Mono Color

Commit Graph

Select branches

Hide Pull Requests

master

mega-moe-nvfp4

modelopt-nvfp4

nvidia-modelopt