- transform_nvfp4_weights_for_mega_moe now accepts weight_scale_2 - Folds global scale into block scales: UE4M3 * FP32 -> UE4M3 - Dequantize to f32, multiply by global scale, clamp [0,448], re-quantize - This is needed because the kernel only applies one level of block scaling