biondizzle

biondizzle pushed to modelopt-nvfp4 at biondizzle/deepseek-v4-quant

2026-05-08 17:23:11 +00:00

24e3b3745d Pin modelopt and transformers versions in README

biondizzle pushed to modelopt-nvfp4 at biondizzle/deepseek-v4-quant

2026-05-08 17:21:22 +00:00

b08afea425 remove weird session dump crap

biondizzle pushed to modelopt-nvfp4 at biondizzle/deepseek-v4-quant

2026-05-08 17:17:49 +00:00

a2370006f7 Update README: document full pipeline, BF16 verification, calib 128 constraint

biondizzle pushed to modelopt-nvfp4 at biondizzle/deepseek-v4-quant

2026-05-08 17:13:41 +00:00

f1d21900ea Remove upcast_to_bf16.py — superseded by dequant_fp8_to_bf16.py

biondizzle pushed to modelopt-nvfp4 at biondizzle/deepseek-v4-quant

2026-05-08 17:10:00 +00:00

ca9a4f5eaa Purge OpenClaw session files, memory dumps, __pycache__. Update .gitignore

biondizzle pushed to modelopt-nvfp4 at biondizzle/deepseek-v4-quant

2026-05-08 17:02:09 +00:00

eeba101cc4 Cleanup: nuke dead scripts and stale docs, rewrite README for full NVFP4 pipeline

biondizzle pushed to modelopt-nvfp4 at biondizzle/deepseek-v4-quant

2026-05-08 16:57:35 +00:00

075da675dc fix: update HF token, echo it at runtime, export both HF_TOKEN and HUGGING_FACE_HUB_TOKEN

biondizzle pushed to modelopt-nvfp4 at biondizzle/deepseek-v4-quant

2026-05-08 13:33:48 +00:00

36e1342270 nvfp4_full: pass HF_TOKEN env var for gated calibration dataset

biondizzle pushed to modelopt-nvfp4 at biondizzle/deepseek-v4-quant

2026-05-08 06:24:49 +00:00

3d38e1d5cd nvfp4_full: drop calib to 128, gpu_max_mem to 0.7 for VRAM headroom

biondizzle pushed to modelopt-nvfp4 at biondizzle/deepseek-v4-quant

2026-05-08 05:50:18 +00:00

d0fc5338fe model_opt_nvfp4_full: add use_seq_device_map, fix source for /bin/sh

biondizzle pushed to modelopt-nvfp4 at biondizzle/deepseek-v4-quant

2026-05-08 02:58:25 +00:00

b70a04696e Add resume capability to dequant script (skip already-done shards)

biondizzle pushed to modelopt-nvfp4 at biondizzle/deepseek-v4-quant

2026-05-08 02:33:48 +00:00

f63eed5cfd Purge INT4 references — expert weights are FP4 (E2M1), not INT4

biondizzle pushed to modelopt-nvfp4 at biondizzle/deepseek-v4-quant

2026-05-08 02:25:44 +00:00

f8533197f2 Fix: expert weights are FP4 (E2M1), not INT4 - verified with nibble analysis

biondizzle pushed to modelopt-nvfp4 at biondizzle/deepseek-v4-quant

2026-05-08 01:50:54 +00:00

b5d569218c Add full nvfp4 quantization script + complete dequant script

biondizzle pushed to modelopt-nvfp4 at biondizzle/deepseek-v4-quant

2026-05-08 01:39:51 +00:00

db6beb5b76 Complete dequant script: handles INT4 experts, FP8 attention, FP8 shared experts

biondizzle pushed to dream-build-glm at biondizzle/vllm-with-lmcache

2026-05-07 19:50:48 +00:00

590189272a patch in the slow kv connector waits

biondizzle pushed to dream-build-glm at biondizzle/vllm-with-lmcache

2026-05-07 19:29:12 +00:00

091da2b61b Merge branch 'dream-build' into dream-build-glm

c0c05d5572 official tweax

Compare 2 commits »

biondizzle pushed to dream-build at biondizzle/vllm-with-lmcache

2026-05-07 19:10:34 +00:00

c0c05d5572 official tweax

biondizzle pushed to modelopt-nvfp4 at biondizzle/deepseek-v4-quant

2026-05-07 16:34:39 +00:00

cbfc5a9afb Update nvfp4_experts_only to use dequantized BF16 model

biondizzle pushed to modelopt-nvfp4 at biondizzle/deepseek-v4-quant

2026-05-07 15:45:47 +00:00

b5d14aa8b8 Add proper FP8→BF16 dequantization script