[Feature] Enable TP + EP shared_experts overlap with router, 3.7% E2E performance improvement (#28164)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
This commit is contained in:
@@ -1178,7 +1178,7 @@ class FusedMoE(CustomOp):
|
||||
hidden_size: Input hidden state size of the transformer
|
||||
intermediate_size: Intermediate size of the experts
|
||||
params_dtype: Data type for the parameters.
|
||||
reduce_results: Whether to all all_reduce on the output of the layer
|
||||
reduce_results: Whether to all_reduce on the output of the layer
|
||||
renormalize: Whether to renormalize the logits in the fused_moe kernel
|
||||
quant_config: Quantization configure.
|
||||
enable_eplb: Whether to enable expert parallelism load balancer.
|
||||
|
||||
Reference in New Issue
Block a user