nvfp4-megamoe-kernel

Files

biondizzle a7eae10ef4 fix: use checkpoint input_scale for activation quantization

Critical fix: the checkpoint's input_scale was used during weight
calibration but we were computing dynamic scale from data (amax/2688).
This was 13x off from the checkpoint value.

Changes:
- stage_activation() accepts optional input_global_scale parameter
- nvfp4_mega_moe_full() accepts l1_input_scale and l2_input_scale
- vLLM patch preserves w13/w2_input_scale in finalize_weights
- L1 activation uses checkpoint w13_input_scale for quantization
- L2 activation uses checkpoint w2_input_scale for quantization
- alpha = input_scale * weight_scale_2 (correct calibration contract)

2026-05-15 23:57:08 +00:00

patches

fix: use checkpoint input_scale for activation quantization

2026-05-15 23:57:08 +00:00