biondizzle

biondizzle pushed to modelopt-nvfp4 at biondizzle/deepseek-v4-quant

2026-05-11 02:01:54 +00:00

653e2d7a50 vLLM NVFP4 serving: full end-to-end pipeline working

db16be8e5d S11: Fixed substr mapping, stacking, suffix, and o_a_proj - loads weights but attention forward uses FP8 einsum incompatible with NVFP4

Compare 2 commits »

biondizzle pushed to modelopt-nvfp4 at biondizzle/deepseek-v4-quant

2026-05-10 16:14:34 +00:00

6fd03a0aa0 vLLM serving: patched deepseek_v4.py, disabled mega_moe, updated docs

biondizzle pushed to modelopt-nvfp4 at biondizzle/deepseek-v4-quant

2026-05-10 09:33:50 +00:00

d88793dee6 Add vllm weight mapper patch and docker-compose

biondizzle pushed to modelopt-nvfp4 at biondizzle/deepseek-v4-quant

2026-05-10 08:59:32 +00:00

30608e3834 Config patches: document modelopt↔vllm gaps with NVIDIA reference

biondizzle pushed to modelopt-nvfp4 at biondizzle/deepseek-v4-quant

2026-05-10 08:23:13 +00:00

0d74b97fb2 Config patches doc + compress_ratios runtime patch in serve script

biondizzle pushed to modelopt-nvfp4 at biondizzle/deepseek-v4-quant

2026-05-10 07:54:35 +00:00

f65d4ab99f Run 11 SUCCESS: 881GB NVFP4 exported, add vLLM serve script

biondizzle pushed to master at biondizzle/vllm-with-media-support

2026-05-10 04:02:31 +00:00

4eb98fe467 Add soundfile - vllm audio needs both av and soundfile

biondizzle pushed to master at biondizzle/vllm-with-media-support

2026-05-10 03:55:41 +00:00

2a01870564 Fix: tag latest for master/main/null branch builds

biondizzle pushed to master at biondizzle/vllm-with-media-support

2026-05-10 03:35:51 +00:00

261d8e58fe Fix: install av directly instead of vllm[audio] with --no-deps

biondizzle pushed to master at biondizzle/vllm-with-media-support

2026-05-10 03:30:06 +00:00

39c76cef64 Add Jenkinsfile

biondizzle created branch master in biondizzle/vllm-with-media-support

2026-05-10 03:25:13 +00:00

biondizzle pushed to master at biondizzle/vllm-with-media-support

2026-05-10 03:25:13 +00:00

1c60fd9738 tweax

biondizzle created repository biondizzle/vllm-with-media-support

2026-05-10 03:16:34 +00:00

biondizzle pushed to modelopt-nvfp4 at biondizzle/deepseek-v4-quant

2026-05-09 23:00:20 +00:00

eb80bd6f80 README + memory: Run 10 result (export crash in get_weight_scaling_factor), Run 11 running

biondizzle pushed to modelopt-nvfp4 at biondizzle/deepseek-v4-quant

2026-05-09 22:51:00 +00:00

07cd50e823 8 patches covering full export chain — no more whack-a-mole

biondizzle pushed to modelopt-nvfp4 at biondizzle/deepseek-v4-quant

2026-05-09 22:43:50 +00:00

efc111a11f Add Patch 4+5: get_weight_scaling_factor and get_weight_scaling_factor_2 CPU safety

biondizzle pushed to modelopt-nvfp4 at biondizzle/deepseek-v4-quant

2026-05-09 16:09:11 +00:00

ce9056d259 README overhaul: reflect current architecture (hf_main, run history through Run 10)

biondizzle pushed to modelopt-nvfp4 at biondizzle/deepseek-v4-quant

2026-05-09 15:58:38 +00:00

5a72da7193 Fix: apply hf_ptq __main__ post-parse conversions (dataset split, calib_size int list)

biondizzle pushed to modelopt-nvfp4 at biondizzle/deepseek-v4-quant

2026-05-09 15:00:26 +00:00

8612914169 Update run history: Runs 7-8, Run 9 running on a300302

biondizzle pushed to modelopt-nvfp4 at biondizzle/deepseek-v4-quant

2026-05-09 14:57:30 +00:00

a300302486 Fix: use hf_ptq.py arg names (--pyt_ckpt_path, --qformat, --inference_tensor_parallel)