[Feature] Enable TP + EP shared_experts overlap with router, 3.7% E2E performance improvement (#28164)

Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-11-05 20:21:08 -05:00
parent 90189c71a9
commit d71af5f502
2 changed files with 16 additions and 8 deletions
--- a/vllm/model_executor/layers/fused_moe/layer.py
+++ b/vllm/model_executor/layers/fused_moe/layer.py
@@ -1178,7 +1178,7 @@ class FusedMoE(CustomOp):
        hidden_size: Input hidden state size of the transformer
        intermediate_size: Intermediate size of the experts
        params_dtype: Data type for the parameters.
-        reduce_results: Whether to all all_reduce on the output of the layer
+        reduce_results: Whether to all_reduce on the output of the layer
        renormalize: Whether to renormalize the logits in the fused_moe kernel
        quant_config: Quantization configure.
        enable_eplb: Whether to enable expert parallelism load balancer.