[Feature] Enable TP + EP shared_experts overlap with router, 3.7% E2E performance improvement (#28164)

Signed-off-by: yewentao256 <zhyanwentao@126.com>
This commit is contained in:
Wentao Ye
2025-11-05 20:21:08 -05:00
committed by GitHub
parent 90189c71a9
commit d71af5f502
2 changed files with 16 additions and 8 deletions

View File

@@ -1178,7 +1178,7 @@ class FusedMoE(CustomOp):
hidden_size: Input hidden state size of the transformer
intermediate_size: Intermediate size of the experts
params_dtype: Data type for the parameters.
reduce_results: Whether to all all_reduce on the output of the layer
reduce_results: Whether to all_reduce on the output of the layer
renormalize: Whether to renormalize the logits in the fused_moe kernel
quant_config: Quantization configure.
enable_eplb: Whether to enable expert parallelism load balancer.