biondizzle
  • Joined on 2025-12-10
biondizzle pushed to modelopt-nvfp4 at biondizzle/deepseek-v4-quant 2026-05-11 02:01:54 +00:00
653e2d7a50 vLLM NVFP4 serving: full end-to-end pipeline working
db16be8e5d S11: Fixed substr mapping, stacking, suffix, and o_a_proj - loads weights but attention forward uses FP8 einsum incompatible with NVFP4
Compare 2 commits »
biondizzle pushed to modelopt-nvfp4 at biondizzle/deepseek-v4-quant 2026-05-10 16:14:34 +00:00
6fd03a0aa0 vLLM serving: patched deepseek_v4.py, disabled mega_moe, updated docs
biondizzle pushed to modelopt-nvfp4 at biondizzle/deepseek-v4-quant 2026-05-10 09:33:50 +00:00
d88793dee6 Add vllm weight mapper patch and docker-compose
biondizzle pushed to modelopt-nvfp4 at biondizzle/deepseek-v4-quant 2026-05-10 08:59:32 +00:00
30608e3834 Config patches: document modelopt↔vllm gaps with NVIDIA reference
biondizzle pushed to modelopt-nvfp4 at biondizzle/deepseek-v4-quant 2026-05-10 08:23:13 +00:00
0d74b97fb2 Config patches doc + compress_ratios runtime patch in serve script
biondizzle pushed to modelopt-nvfp4 at biondizzle/deepseek-v4-quant 2026-05-10 07:54:35 +00:00
f65d4ab99f Run 11 SUCCESS: 881GB NVFP4 exported, add vLLM serve script
biondizzle pushed to master at biondizzle/vllm-with-media-support 2026-05-10 04:02:31 +00:00
4eb98fe467 Add soundfile - vllm audio needs both av and soundfile
biondizzle pushed to master at biondizzle/vllm-with-media-support 2026-05-10 03:55:41 +00:00
2a01870564 Fix: tag latest for master/main/null branch builds
biondizzle pushed to master at biondizzle/vllm-with-media-support 2026-05-10 03:35:51 +00:00
261d8e58fe Fix: install av directly instead of vllm[audio] with --no-deps
biondizzle pushed to master at biondizzle/vllm-with-media-support 2026-05-10 03:30:06 +00:00
39c76cef64 Add Jenkinsfile
biondizzle created branch master in biondizzle/vllm-with-media-support 2026-05-10 03:25:13 +00:00
biondizzle pushed to master at biondizzle/vllm-with-media-support 2026-05-10 03:25:13 +00:00
1c60fd9738 tweax
biondizzle created repository biondizzle/vllm-with-media-support 2026-05-10 03:16:34 +00:00
biondizzle pushed to modelopt-nvfp4 at biondizzle/deepseek-v4-quant 2026-05-09 23:00:20 +00:00
eb80bd6f80 README + memory: Run 10 result (export crash in get_weight_scaling_factor), Run 11 running
biondizzle pushed to modelopt-nvfp4 at biondizzle/deepseek-v4-quant 2026-05-09 22:51:00 +00:00
07cd50e823 8 patches covering full export chain — no more whack-a-mole
biondizzle pushed to modelopt-nvfp4 at biondizzle/deepseek-v4-quant 2026-05-09 22:43:50 +00:00
efc111a11f Add Patch 4+5: get_weight_scaling_factor and get_weight_scaling_factor_2 CPU safety
biondizzle pushed to modelopt-nvfp4 at biondizzle/deepseek-v4-quant 2026-05-09 16:09:11 +00:00
ce9056d259 README overhaul: reflect current architecture (hf_main, run history through Run 10)
biondizzle pushed to modelopt-nvfp4 at biondizzle/deepseek-v4-quant 2026-05-09 15:58:38 +00:00
5a72da7193 Fix: apply hf_ptq __main__ post-parse conversions (dataset split, calib_size int list)
biondizzle pushed to modelopt-nvfp4 at biondizzle/deepseek-v4-quant 2026-05-09 15:00:26 +00:00
8612914169 Update run history: Runs 7-8, Run 9 running on a300302
biondizzle pushed to modelopt-nvfp4 at biondizzle/deepseek-v4-quant 2026-05-09 14:57:30 +00:00
a300302486 Fix: use hf_ptq.py arg names (--pyt_ckpt_path, --qformat, --inference_tensor_parallel)