deepseek-v4-quant

Files

biondizzle e963325b61 WIP: MegaMoE NVFP4 kernel + diagnostics

- Force use_mega_moe=True for NVFP4 pipeline
- DeepseekV4MegaMoEExperts: load NVFP4 params (float8 block scales,
  float32 global/input scales), convert NVFP4→BF16→MXFP4 in
  finalize_weights for the DeepGEMM mega_moe kernel
- Add _nvfp4_to_bf16 and _bf16_to_mxfp4 conversion methods
- Remove expert_dtype check blocking mega_moe
- Add diagnostics for wo_a and bf16 layer conversion
- Still WIP: attention layer bugs under investigation

2026-05-11 05:19:49 +00:00

deepseek_v4.py

WIP: MegaMoE NVFP4 kernel + diagnostics

2026-05-11 05:19:49 +00:00

deepseek_v4.py.bak

S11: Fixed substr mapping, stacking, suffix, and o_a_proj - loads weights but attention forward uses FP8 einsum incompatible with NVFP4

2026-05-10 17:45:53 +00:00

deepseek_v4.py.s11

S11: Fixed substr mapping, stacking, suffix, and o_a_proj - loads weights but attention forward uses FP8 einsum incompatible with NVFP4

2026-05-10 17:45:53 +00:00

patch_finegrained_fp8_blackwell.py

Add BF16 upcast script and Blackwell DeepGEMM patch

2026-05-07 14:25:30 +00:00

patch_vllm_weights.py

vLLM serving: patched deepseek_v4.py, disabled mega_moe, updated docs

2026-05-10 16:14:17 +00:00

quant_module_patched.py

Add ModelOpt NVFP4 pipeline: patch, run script, README

2026-05-07 07:22:54 +00:00