Files
nvfp4-megamoe-kernel/vllm
biondizzle 1836e5fdc7 Add shared experts to post-quant BF16 dequant fix
Shared experts also use FlashInferCutlassNvFp4LinearKernel with
broken input_scale. They need the same BF16 dequant treatment.
gate_up_proj and down_proj on ffn.shared_experts.
2026-05-18 19:27:49 +00:00
..