[Perf] Eliminate padding and slicing op for GPT-OSS with Flashinfer MXFP4 MXFP8 MoE (#30647)
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
This commit is contained in:
@@ -162,3 +162,12 @@ deepseek_v3_fp8 = ModelFusionInfo(
|
||||
# async_tp=n_layers * 2,
|
||||
),
|
||||
)
|
||||
|
||||
gpt_oss_20b = ModelFusionInfo(
|
||||
model_name="openai/gpt-oss-20b",
|
||||
matches=lambda n_layers: Matches(
|
||||
ar_rms_fusion=n_layers * 2 + 1,
|
||||
sequence_parallel=n_layers * 2 + 1,
|
||||
async_tp=n_layers * 2,
|
||||
),
|
||||
)
|
||||
|
||||
Reference in New Issue
Block a user