biondizzle
  • Joined on 2025-12-10
biondizzle pushed to modelopt-nvfp4 at biondizzle/deepseek-v4-quant 2026-05-08 17:23:11 +00:00
24e3b3745d Pin modelopt and transformers versions in README
biondizzle pushed to modelopt-nvfp4 at biondizzle/deepseek-v4-quant 2026-05-08 17:21:22 +00:00
b08afea425 remove weird session dump crap
biondizzle pushed to modelopt-nvfp4 at biondizzle/deepseek-v4-quant 2026-05-08 17:17:49 +00:00
a2370006f7 Update README: document full pipeline, BF16 verification, calib 128 constraint
biondizzle pushed to modelopt-nvfp4 at biondizzle/deepseek-v4-quant 2026-05-08 17:13:41 +00:00
f1d21900ea Remove upcast_to_bf16.py — superseded by dequant_fp8_to_bf16.py
biondizzle pushed to modelopt-nvfp4 at biondizzle/deepseek-v4-quant 2026-05-08 17:10:00 +00:00
ca9a4f5eaa Purge OpenClaw session files, memory dumps, __pycache__. Update .gitignore
biondizzle pushed to modelopt-nvfp4 at biondizzle/deepseek-v4-quant 2026-05-08 17:02:09 +00:00
eeba101cc4 Cleanup: nuke dead scripts and stale docs, rewrite README for full NVFP4 pipeline
biondizzle pushed to modelopt-nvfp4 at biondizzle/deepseek-v4-quant 2026-05-08 16:57:35 +00:00
075da675dc fix: update HF token, echo it at runtime, export both HF_TOKEN and HUGGING_FACE_HUB_TOKEN
biondizzle pushed to modelopt-nvfp4 at biondizzle/deepseek-v4-quant 2026-05-08 13:33:48 +00:00
36e1342270 nvfp4_full: pass HF_TOKEN env var for gated calibration dataset
biondizzle pushed to modelopt-nvfp4 at biondizzle/deepseek-v4-quant 2026-05-08 06:24:49 +00:00
3d38e1d5cd nvfp4_full: drop calib to 128, gpu_max_mem to 0.7 for VRAM headroom
biondizzle pushed to modelopt-nvfp4 at biondizzle/deepseek-v4-quant 2026-05-08 05:50:18 +00:00
d0fc5338fe model_opt_nvfp4_full: add use_seq_device_map, fix source for /bin/sh
biondizzle pushed to modelopt-nvfp4 at biondizzle/deepseek-v4-quant 2026-05-08 02:58:25 +00:00
b70a04696e Add resume capability to dequant script (skip already-done shards)
biondizzle pushed to modelopt-nvfp4 at biondizzle/deepseek-v4-quant 2026-05-08 02:33:48 +00:00
f63eed5cfd Purge INT4 references — expert weights are FP4 (E2M1), not INT4
biondizzle pushed to modelopt-nvfp4 at biondizzle/deepseek-v4-quant 2026-05-08 02:25:44 +00:00
f8533197f2 Fix: expert weights are FP4 (E2M1), not INT4 - verified with nibble analysis
biondizzle pushed to modelopt-nvfp4 at biondizzle/deepseek-v4-quant 2026-05-08 01:50:54 +00:00
b5d569218c Add full nvfp4 quantization script + complete dequant script
biondizzle pushed to modelopt-nvfp4 at biondizzle/deepseek-v4-quant 2026-05-08 01:39:51 +00:00
db6beb5b76 Complete dequant script: handles INT4 experts, FP8 attention, FP8 shared experts
biondizzle pushed to dream-build-glm at biondizzle/vllm-with-lmcache 2026-05-07 19:50:48 +00:00
590189272a patch in the slow kv connector waits
biondizzle pushed to dream-build-glm at biondizzle/vllm-with-lmcache 2026-05-07 19:29:12 +00:00
091da2b61b Merge branch 'dream-build' into dream-build-glm
c0c05d5572 official tweax
Compare 2 commits »
biondizzle pushed to dream-build at biondizzle/vllm-with-lmcache 2026-05-07 19:10:34 +00:00
c0c05d5572 official tweax
biondizzle pushed to modelopt-nvfp4 at biondizzle/deepseek-v4-quant 2026-05-07 16:34:39 +00:00
cbfc5a9afb Update nvfp4_experts_only to use dequantized BF16 model
biondizzle pushed to modelopt-nvfp4 at biondizzle/deepseek-v4-quant 2026-05-07 15:45:47 +00:00
b5d14aa8b8 Add proper FP8→BF16 dequantization script