Files
nvfp4-megamoe-kernel/vllm/kernels/linear
biondizzle ffc2264c41 Fix activation global scale: don't double-invert input_global_scale_inv
The activation global scale = amax / (6.0 * 448.0). Both the linear
kernel and MoE kernel were taking 1.0 / (value that's already the
correct gs), inverting it and producing garbage quantization.

Linear kernel: input_global_scale_inv IS the gs, so use it directly.
MoE kernel: w13_input_scale_orig (after undoing convert inversion) IS
the gs, so use it directly.
2026-05-19 06:03:08 +00:00
..