Shared experts also use FlashInferCutlassNvFp4LinearKernel with broken input_scale. They need the same BF16 dequant treatment. gate_up_proj and down_proj on ffn.shared_experts.
Shared experts also use FlashInferCutlassNvFp4LinearKernel with broken input_scale. They need the same BF16 dequant treatment. gate_up_proj and down_proj on ffn.shared_experts.