vllm/vllm/model_executor/layers/fused_moe/configs at 77a73458e3ae8b5b7a2a13f78d3a6b4d39b1414d - vllm

E=1,N=1792,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a16.json

[Kernel] Add tuned triton configs for ExpertsInt8 (#7601 )

2024-08-16 11:37:01 -07:00

E=1,N=1792,device_name=NVIDIA_A100-SXM4-80GB.json

[Kernel] Add tuned triton configs for ExpertsInt8 (#7601 )

2024-08-16 11:37:01 -07:00

E=1,N=1792,device_name=NVIDIA_H100_80GB_HBM3,dtype=int8_w8a16.json

[Kernel][Moe Configs] Add more tuned triton configs for ExpertsInt8 and FP8 (#25858 )

2025-09-30 07:30:44 -07:00

E=1,N=3072,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a16.json

[Kernel] Add tuned triton configs for ExpertsInt8 (#7601 )

2024-08-16 11:37:01 -07:00

E=1,N=3072,device_name=NVIDIA_H100_80GB_HBM3,dtype=int8_w8a16.json

[Kernel][Moe Configs] Add more tuned triton configs for ExpertsInt8 and FP8 (#25858 )

2025-09-30 07:30:44 -07:00

E=1,N=3072,device_name=NVIDIA_H100_80GB_HBM3.json

[Kernel][Moe Configs] Add more tuned triton configs for ExpertsInt8 and FP8 (#25858 )

2025-09-30 07:30:44 -07:00

E=1,N=3072,device_name=NVIDIA_H200,dtype=int8_w8a16.json

[Kernel][Moe Configs] Add more tuned triton configs for ExpertsInt8 and FP8 (#25858 )

2025-09-30 07:30:44 -07:00

E=1,N=3584,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a16.json

[Kernel] Add tuned triton configs for ExpertsInt8 (#7601 )

2024-08-16 11:37:01 -07:00

E=1,N=3584,device_name=NVIDIA_A100-SXM4-80GB.json

[Kernel] Add tuned triton configs for ExpertsInt8 (#7601 )

2024-08-16 11:37:01 -07:00

E=1,N=3584,device_name=NVIDIA_H100_80GB_HBM3,dtype=int8_w8a16.json

[Kernel][Moe Configs] Add more tuned triton configs for ExpertsInt8 and FP8 (#25858 )

2025-09-30 07:30:44 -07:00

E=1,N=7168,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a16.json

[Kernel] Add tuned triton configs for ExpertsInt8 (#7601 )

2024-08-16 11:37:01 -07:00

E=1,N=7168,device_name=NVIDIA_A100-SXM4-80GB.json

[Kernel] Add tuned triton configs for ExpertsInt8 (#7601 )

2024-08-16 11:37:01 -07:00

E=1,N=7168,device_name=NVIDIA_H100_80GB_HBM3,dtype=int8_w8a16.json

[Kernel][Moe Configs] Add more tuned triton configs for ExpertsInt8 and FP8 (#25858 )

2025-09-30 07:30:44 -07:00

E=1,N=14336,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a16.json

[Kernel] Add tuned triton configs for ExpertsInt8 (#7601 )

2024-08-16 11:37:01 -07:00

E=1,N=14336,device_name=NVIDIA_A100-SXM4-80GB.json

[Kernel] Add tuned triton configs for ExpertsInt8 (#7601 )

2024-08-16 11:37:01 -07:00

E=8,N=1792,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8.json

[ROCm][MoE] MI300 tuned configs Mixtral-8x(7B,22B) | fp16, fp8 (#12408 )

2025-01-25 12:17:19 +08:00

E=8,N=1792,device_name=AMD_Instinct_MI300X.json

[ROCm][MoE] MI300 tuned configs Mixtral-8x(7B,22B) | fp16, fp8 (#12408 )

2025-01-25 12:17:19 +08:00

E=8,N=1792,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8.json

[ROCm][MoE configs] mi325 mixtral & mi300 qwen_moe (#13503 )

2025-02-18 22:23:24 -08:00

E=8,N=1792,device_name=AMD_Instinct_MI325X.json

[ROCm][MoE configs] mi325 mixtral & mi300 qwen_moe (#13503 )

2025-02-18 22:23:24 -08:00

E=8,N=1792,device_name=NVIDIA_A100-SXM4-40GB.json

[Kernel] Add MoE Triton kernel configs for A100 40GB (#3700 )

2024-03-28 15:26:24 -07:00

E=8,N=1792,device_name=NVIDIA_A100-SXM4-80GB.json

[Kernel] change benchmark script so that result can be directly used; tune moe kernel in A100/H100 with tp=2,4,8 (#3389 )

2024-03-14 08:11:48 +00:00

E=8,N=1792,device_name=NVIDIA_H100_80GB_HBM3.json

[Kernel] change benchmark script so that result can be directly used; tune moe kernel in A100/H100 with tp=2,4,8 (#3389 )

2024-03-14 08:11:48 +00:00

E=8,N=1792,device_name=NVIDIA_H200,dtype=fp8_w8a8.json

[Kernel] Add more tuned configs (#14877 )

2025-03-15 20:25:03 -07:00

E=8,N=1792,device_name=NVIDIA_H200.json

[Kernel] Add more tuned configs (#14877 )

2025-03-15 20:25:03 -07:00

E=8,N=2048,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8.json

[ROCm][MoE] MI300 tuned configs Mixtral-8x(7B,22B) | fp16, fp8 (#12408 )

2025-01-25 12:17:19 +08:00

E=8,N=2048,device_name=AMD_Instinct_MI300X.json

[ROCm][MoE] MI300 tuned configs Mixtral-8x(7B,22B) | fp16, fp8 (#12408 )

2025-01-25 12:17:19 +08:00

E=8,N=2048,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8.json

[ROCm][MoE configs] mi325 mixtral & mi300 qwen_moe (#13503 )

2025-02-18 22:23:24 -08:00

E=8,N=2048,device_name=AMD_Instinct_MI325X.json

[ROCm][MoE configs] mi325 mixtral & mi300 qwen_moe (#13503 )

2025-02-18 22:23:24 -08:00

E=8,N=2048,device_name=NVIDIA_A100-SXM4-80GB.json

[Kernel] Fused MoE Config for Mixtral 8x22 (#4002 )

2024-04-11 07:50:00 -07:00

E=8,N=2048,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8.json

[Kernel] W8A16 Int8 inside FusedMoE (#7415 )

2024-08-16 10:06:51 -07:00

E=8,N=2048,device_name=NVIDIA_H100_80GB_HBM3.json

[Kernel] Fused MoE Config for Mixtral 8x22 (#4002 )

2024-04-11 07:50:00 -07:00

E=8,N=2048,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json

[Perf] Add Triton config for DeepSeek V3 FP8 EP32 H200 (#23504 )

2025-08-24 18:06:35 -07:00

E=8,N=2048,device_name=NVIDIA_H200,dtype=fp8_w8a8.json

[Kernel] Add more tuned configs (#14877 )

2025-03-15 20:25:03 -07:00

E=8,N=2048,device_name=NVIDIA_H200.json

[Kernel] Add more tuned configs (#14877 )

2025-03-15 20:25:03 -07:00

E=8,N=3584,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8.json

[ROCm][MoE] MI300 tuned configs Mixtral-8x(7B,22B) | fp16, fp8 (#12408 )

2025-01-25 12:17:19 +08:00

E=8,N=3584,device_name=AMD_Instinct_MI300X.json

[ROCm][MoE] mi300 mixtral8x7B perf for specific BS (#13577 )

2025-02-20 04:01:02 +00:00

E=8,N=3584,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8.json

[ROCm][MoE configs] mi325 mixtral & mi300 qwen_moe (#13503 )

2025-02-18 22:23:24 -08:00

E=8,N=3584,device_name=AMD_Instinct_MI325X.json

[ROCm][CI] Fix AITER test flakiness by using explicit attention backend (#32346 )

2026-01-22 13:55:25 +08:00

E=8,N=3584,device_name=NVIDIA_A100-SXM4-40GB.json

[Kernel] Add MoE Triton kernel configs for A100 40GB (#3700 )

2024-03-28 15:26:24 -07:00

E=8,N=3584,device_name=NVIDIA_A100-SXM4-80GB.json

[Kernel] change benchmark script so that result can be directly used; tune moe kernel in A100/H100 with tp=2,4,8 (#3389 )

2024-03-14 08:11:48 +00:00

E=8,N=3584,device_name=NVIDIA_GeForce_RTX_4090,dtype=fp8_w8a8.json

[Kernel] Add more tuned configs (#14877 )

2025-03-15 20:25:03 -07:00

E=8,N=3584,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8.json

[Kernel] W8A16 Int8 inside FusedMoE (#7415 )

2024-08-16 10:06:51 -07:00

E=8,N=3584,device_name=NVIDIA_H100_80GB_HBM3.json

[Kernel] change benchmark script so that result can be directly used; tune moe kernel in A100/H100 with tp=2,4,8 (#3389 )

2024-03-14 08:11:48 +00:00

E=8,N=3584,device_name=NVIDIA_H200,dtype=fp8_w8a8.json

[Kernel] Add more tuned configs (#14877 )

2025-03-15 20:25:03 -07:00

E=8,N=3584,device_name=NVIDIA_H200.json

[Kernel] Add more tuned configs (#14877 )

2025-03-15 20:25:03 -07:00

E=8,N=3584,device_name=NVIDIA_L40S.json

[Kernel] adding fused moe kernel config for L40S TP4 (#9245 )

2024-10-11 08:54:22 -07:00

E=8,N=4096,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8.json

[ROCm][MoE] MI300 tuned configs Mixtral-8x(7B,22B) | fp16, fp8 (#12408 )

2025-01-25 12:17:19 +08:00

E=8,N=4096,device_name=AMD_Instinct_MI300X.json

[ROCm][MoE] MI300 tuned configs Mixtral-8x(7B,22B) | fp16, fp8 (#12408 )

2025-01-25 12:17:19 +08:00

E=8,N=4096,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8.json

[ROCm][MoE configs] mi325 mixtral & mi300 qwen_moe (#13503 )

2025-02-18 22:23:24 -08:00

E=8,N=4096,device_name=AMD_Instinct_MI325X.json

[ROCm][MoE configs] mi325 mixtral & mi300 qwen_moe (#13503 )

2025-02-18 22:23:24 -08:00

E=8,N=4096,device_name=NVIDIA_A100-SXM4-80GB.json

[Kernel] Fused MoE Config for Mixtral 8x22 (#4002 )

2024-04-11 07:50:00 -07:00

E=8,N=4096,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8.json

[Kernel] W8A16 Int8 inside FusedMoE (#7415 )

2024-08-16 10:06:51 -07:00

E=8,N=4096,device_name=NVIDIA_H100_80GB_HBM3.json

[Kernel] Fused MoE Config for Mixtral 8x22 (#4002 )

2024-04-11 07:50:00 -07:00

E=8,N=4096,device_name=NVIDIA_H200,dtype=fp8_w8a8.json

[Kernel] Add more tuned configs (#14877 )

2025-03-15 20:25:03 -07:00

E=8,N=4096,device_name=NVIDIA_H200.json

[Kernel] Add more tuned configs (#14877 )

2025-03-15 20:25:03 -07:00

E=8,N=7168,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8.json

[ROCm][MoE] MI300 tuned configs Mixtral-8x(7B,22B) | fp16, fp8 (#12408 )

2025-01-25 12:17:19 +08:00

E=8,N=7168,device_name=AMD_Instinct_MI300X.json

[ROCm][MoE] mi300 mixtral8x7B perf for specific BS (#13577 )

2025-02-20 04:01:02 +00:00

E=8,N=7168,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8.json

[ROCm][MoE configs] mi325 mixtral & mi300 qwen_moe (#13503 )

2025-02-18 22:23:24 -08:00

E=8,N=7168,device_name=AMD_Instinct_MI325X.json

[ROCm][MoE configs] mi325 mixtral & mi300 qwen_moe (#13503 )

2025-02-18 22:23:24 -08:00

E=8,N=7168,device_name=NVIDIA_A100-SXM4-80GB.json

[Kernel] change benchmark script so that result can be directly used; tune moe kernel in A100/H100 with tp=2,4,8 (#3389 )

2024-03-14 08:11:48 +00:00

E=8,N=7168,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8.json

[Kernel] W8A16 Int8 inside FusedMoE (#7415 )

2024-08-16 10:06:51 -07:00

E=8,N=7168,device_name=NVIDIA_H100_80GB_HBM3.json

[Kernel] change benchmark script so that result can be directly used; tune moe kernel in A100/H100 with tp=2,4,8 (#3389 )

2024-03-14 08:11:48 +00:00

E=8,N=7168,device_name=NVIDIA_H200,dtype=fp8_w8a8.json

[Kernel] Add more tuned configs (#14877 )

2025-03-15 20:25:03 -07:00

E=8,N=7168,device_name=NVIDIA_H200.json

[Kernel] Add more tuned configs (#14877 )

2025-03-15 20:25:03 -07:00

E=8,N=8192,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8.json

[ROCm][MoE] MI300 tuned configs Mixtral-8x(7B,22B) | fp16, fp8 (#12408 )

2025-01-25 12:17:19 +08:00

E=8,N=8192,device_name=AMD_Instinct_MI300X.json

[ROCm][MoE] MI300 tuned configs Mixtral-8x(7B,22B) | fp16, fp8 (#12408 )

2025-01-25 12:17:19 +08:00

E=8,N=8192,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8.json

[ROCm][MoE configs] mi325 mixtral & mi300 qwen_moe (#13503 )

2025-02-18 22:23:24 -08:00

E=8,N=8192,device_name=AMD_Instinct_MI325X.json

[ROCm][MoE configs] mi325 mixtral & mi300 qwen_moe (#13503 )

2025-02-18 22:23:24 -08:00

E=8,N=8192,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8.json

[Kernel] W8A16 Int8 inside FusedMoE (#7415 )

2024-08-16 10:06:51 -07:00

E=8,N=8192,device_name=NVIDIA_H200,dtype=fp8_w8a8.json

[Kernel] Add more tuned configs (#14877 )

2025-03-15 20:25:03 -07:00

E=8,N=14336,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8.json

[ROCm][MoE] MI300 tuned configs Mixtral-8x(7B,22B) | fp16, fp8 (#12408 )

2025-01-25 12:17:19 +08:00

E=8,N=14336,device_name=AMD_Instinct_MI300X.json

[ROCm][MoE] mi300 mixtral8x7B perf for specific BS (#13577 )

2025-02-20 04:01:02 +00:00

E=8,N=14336,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8.json

[ROCm][MoE configs] mi325 mixtral & mi300 qwen_moe (#13503 )

2025-02-18 22:23:24 -08:00

E=8,N=14336,device_name=AMD_Instinct_MI325X.json

[ROCm][MoE configs] mi325 mixtral & mi300 qwen_moe (#13503 )

2025-02-18 22:23:24 -08:00

E=8,N=14336,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8.json

[Kernel] W8A16 Int8 inside FusedMoE (#7415 )

2024-08-16 10:06:51 -07:00

E=8,N=14336,device_name=NVIDIA_H200,dtype=fp8_w8a8.json

[Kernel] Add more tuned configs (#14877 )

2025-03-15 20:25:03 -07:00

E=8,N=14336,device_name=NVIDIA_H200.json

[Kernel] Add more tuned configs (#14877 )

2025-03-15 20:25:03 -07:00

E=8,N=16384,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8.json

[ROCm][MoE] MI300 tuned configs Mixtral-8x(7B,22B) | fp16, fp8 (#12408 )

2025-01-25 12:17:19 +08:00

E=8,N=16384,device_name=AMD_Instinct_MI300X.json

[ROCm][MoE] MI300 tuned configs Mixtral-8x(7B,22B) | fp16, fp8 (#12408 )

2025-01-25 12:17:19 +08:00

E=8,N=16384,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8.json

[ROCm][MoE configs] mi325 mixtral & mi300 qwen_moe (#13503 )

2025-02-18 22:23:24 -08:00

E=8,N=16384,device_name=AMD_Instinct_MI325X.json

[ROCm][MoE configs] mi325 mixtral & mi300 qwen_moe (#13503 )

2025-02-18 22:23:24 -08:00

E=16,N=800,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8.json

[Model] Adding support for MSFT Phi-3.5-MoE (#7729 )

2024-08-30 13:42:57 -06:00

E=16,N=1024,device_name=AMD_Instinct_MI300X.json

Upstream Llama4 Support to Main (#16113 )

2025-04-07 08:06:27 -07:00

E=16,N=1024,device_name=NVIDIA_B200,dtype=fp8_w8a8.json

[Benchmark] Add support for multiple batch size benchmark through CLI in benchmark_moe.py (#20516 )

2025-07-06 09:20:11 +00:00

E=16,N=1024,device_name=NVIDIA_B200.json

Add Triton Fused MoE kernel config for E=16 on B200 (#19518 )

2025-06-12 04:31:51 +00:00

E=16,N=1024,device_name=NVIDIA_H100.json

[Kernel] Add tuned FusedMoE kernel config for Llama4 Scout, TP=8 on H100 (#16488 )

2025-04-12 06:26:55 +08:00

E=16,N=1024,device_name=NVIDIA_H200,dtype=fp8_w8a8.json

Added MoE configs for llama 4, H200 device with tp=4/8 tuning (#26837 )

2025-10-14 14:21:03 -07:00

E=16,N=1024,device_name=NVIDIA_H200.json

Added MoE configs for llama 4, H200 device with tp=4/8 tuning (#26837 )

2025-10-14 14:21:03 -07:00

E=16,N=1344,device_name=NVIDIA_A100-SXM4-40GB.json

[Kernel] Add MoE Triton kernel configs for A100 40GB (#3700 )

2024-03-28 15:26:24 -07:00

E=16,N=1344,device_name=NVIDIA_A100-SXM4-80GB.json

[Kernel] Add Triton MoE kernel configs for DBRX on A100 (#3679 )

2024-03-27 22:22:25 -07:00

E=16,N=1344,device_name=NVIDIA_H100_80GB_HBM3.json

[Kernel] DBRX Triton MoE kernel H100 (#3692 )

2024-03-28 10:05:34 -07:00

E=16,N=1792,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a16.json

[Kernel] Add tuned triton configs for ExpertsInt8 (#7601 )

2024-08-16 11:37:01 -07:00

E=16,N=1792,device_name=NVIDIA_A100-SXM4-80GB.json

[Kernel] Add tuned triton configs for ExpertsInt8 (#7601 )

2024-08-16 11:37:01 -07:00

E=16,N=1792,device_name=NVIDIA_H100_80GB_HBM3,dtype=int8_w8a16.json

[Kernel][Moe Configs] Add more tuned triton configs for ExpertsInt8 and FP8 (#25858 )

2025-09-30 07:30:44 -07:00

E=16,N=1792,device_name=NVIDIA_H100_80GB_HBM3.json

[Kernel][Moe Configs] Add more tuned triton configs for ExpertsInt8 and FP8 (#25858 )

2025-09-30 07:30:44 -07:00

E=16,N=2048,device_name=NVIDIA_H200,dtype=fp8_w8a8.json

Added MoE configs for llama 4, H200 device with tp=4/8 tuning (#26837 )

2025-10-14 14:21:03 -07:00

E=16,N=2048,device_name=NVIDIA_H200.json

Added MoE configs for llama 4, H200 device with tp=4/8 tuning (#26837 )

2025-10-14 14:21:03 -07:00

E=16,N=2688,device_name=NVIDIA_A100-SXM4-80GB.json

[Kernel] Add Triton MoE kernel configs for DBRX on A100 (#3679 )

2024-03-27 22:22:25 -07:00

E=16,N=2688,device_name=NVIDIA_H100_80GB_HBM3.json

[Kernel] DBRX Triton MoE kernel H100 (#3692 )

2024-03-28 10:05:34 -07:00

E=16,N=3072,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a16.json

[Kernel] Add tuned triton configs for ExpertsInt8 (#7601 )

2024-08-16 11:37:01 -07:00

E=16,N=3072,device_name=NVIDIA_H100_80GB_HBM3,dtype=float8.json

[Kernel][Moe Configs] Add more tuned triton configs for ExpertsInt8 and FP8 (#25858 )

2025-09-30 07:30:44 -07:00

E=16,N=3072,device_name=NVIDIA_H100_80GB_HBM3,dtype=int8_w8a16.json

[Kernel][Moe Configs] Add more tuned triton configs for ExpertsInt8 and FP8 (#25858 )

2025-09-30 07:30:44 -07:00

E=16,N=3072,device_name=NVIDIA_H200,dtype=int8_w8a16.json

[Kernel][Moe Configs] Add more tuned triton configs for ExpertsInt8 and FP8 (#25858 )

2025-09-30 07:30:44 -07:00

E=16,N=3200,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8.json

[Model] Adding support for MSFT Phi-3.5-MoE (#7729 )

2024-08-30 13:42:57 -06:00

E=16,N=3584,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a16.json

[Kernel] Add tuned triton configs for ExpertsInt8 (#7601 )

2024-08-16 11:37:01 -07:00

E=16,N=3584,device_name=NVIDIA_A100-SXM4-80GB.json

[Kernel] Add tuned triton configs for ExpertsInt8 (#7601 )

2024-08-16 11:37:01 -07:00

E=16,N=3584,device_name=NVIDIA_H100_80GB_HBM3,dtype=int8_w8a16.json

[Kernel][Moe Configs] Add more tuned triton configs for ExpertsInt8 and FP8 (#25858 )

2025-09-30 07:30:44 -07:00

E=16,N=4096,device_name=NVIDIA_B200,dtype=fp8_w8a8,block_shape=[128,128].json

Add support for Mistral Large 3 inference with Flashinfer MoE (#33174 )

2026-01-30 22:48:27 -08:00

E=16,N=4096,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json

Add support for Mistral Large 3 inference with Flashinfer MoE (#33174 )

2026-01-30 22:48:27 -08:00

E=16,N=6400,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8.json

[Model] Adding support for MSFT Phi-3.5-MoE (#7729 )

2024-08-30 13:42:57 -06:00

E=16,N=7168,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a16.json

[Kernel] Add tuned triton configs for ExpertsInt8 (#7601 )

2024-08-16 11:37:01 -07:00

E=16,N=7168,device_name=NVIDIA_A100-SXM4-80GB.json

[Kernel] Add tuned triton configs for ExpertsInt8 (#7601 )

2024-08-16 11:37:01 -07:00

E=16,N=7168,device_name=NVIDIA_H100_80GB_HBM3,dtype=float8.json

[Kernel][Moe Configs] Add more tuned triton configs for ExpertsInt8 and FP8 (#25858 )

2025-09-30 07:30:44 -07:00

E=16,N=7168,device_name=NVIDIA_H100_80GB_HBM3,dtype=int8_w8a16.json

[Kernel][Moe Configs] Add more tuned triton configs for ExpertsInt8 and FP8 (#25858 )

2025-09-30 07:30:44 -07:00

E=16,N=14336,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a16.json

[Kernel] Add tuned triton configs for ExpertsInt8 (#7601 )

2024-08-16 11:37:01 -07:00

E=16,N=14336,device_name=NVIDIA_A100-SXM4-80GB.json

[Kernel] Add tuned triton configs for ExpertsInt8 (#7601 )

2024-08-16 11:37:01 -07:00

E=16,N=14336,device_name=NVIDIA_H100_80GB_HBM3,dtype=int8_w8a16.json

[Kernel][Moe Configs] Add more tuned triton configs for ExpertsInt8 and FP8 (#25858 )

2025-09-30 07:30:44 -07:00

E=20,N=1536,device_name=NVIDIA_RTX_PRO_6000_Blackwell_Server_Edition,dtype=fp8_w8a8.json

chore: add RTX_PRO_6000 GLM4.6-FP8 kernel tuning (#29240 )

2025-11-22 08:42:48 -08:00

E=20,N=2560,device_name=NVIDIA_B200,dtype=fp8_w8a8,block_shape=[128,128].json

QWEN3 Coder Fused MoE kernels Optimization configs (#24266 )

2025-09-04 20:33:43 +00:00

E=20,N=2560,device_name=NVIDIA_GB200,dtype=fp8_w8a8,block_shape=[128,128].json

QWEN3 Coder Fused MoE kernels Optimization configs (#24266 )

2025-09-04 20:33:43 +00:00

E=20,N=2560,device_name=NVIDIA_H20-3e,dtype=fp8_w8a8,block_shape=[128,128].json

[Bugfix] Fix Qwen3-coder moe tuned config (#24072 )

2025-09-07 05:19:46 +00:00

E=20,N=2560,device_name=NVIDIA_H100,dtype=fp8_w8a8,block_shape=[128,128].json

QWEN3 Coder Fused MoE kernels Optimization configs (#24266 )

2025-09-04 20:33:43 +00:00

E=32,N=1408,device_name=NVIDIA_B200.json

[Kernel][MoE] Add MoE tunings for GLM 4.6-FP8 and GLM 4.5 Air on NVidia B200 (#26818 )

2025-10-14 11:20:39 -07:00

E=32,N=2048,device_name=NVIDIA_B200,dtype=fp8_w8a8,block_shape=[128,128].json

[deepseek] add EP8 FusedMOE config for H200 and B200 (#26331 )

2025-10-07 10:38:54 -07:00

E=32,N=2048,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json

[deepseek] add EP8 FusedMOE config for H200 and B200 (#26331 )

2025-10-07 10:38:54 -07:00

E=40,N=1536,device_name=NVIDIA_B200,dtype=fp8_w8a8.json

[Kernel][MoE] Add MoE tunings for GLM 4.6-FP8 and GLM 4.5 Air on NVidia B200 (#26818 )

2025-10-14 11:20:39 -07:00

E=40,N=2560,device_name=NVIDIA_B200,dtype=fp8_w8a8,block_shape=[128,128].json

QWEN3 Coder Fused MoE kernels Optimization configs (#24266 )

2025-09-04 20:33:43 +00:00

E=40,N=2560,device_name=NVIDIA_GB200,dtype=fp8_w8a8,block_shape=[128,128].json

QWEN3 Coder Fused MoE kernels Optimization configs (#24266 )

2025-09-04 20:33:43 +00:00

E=40,N=2560,device_name=NVIDIA_H100,dtype=fp8_w8a8,block_shape=[128,128].json

QWEN3 Coder Fused MoE kernels Optimization configs (#24266 )

2025-09-04 20:33:43 +00:00

E=60,N=176,device_name=AMD_Instinct_MI300X.json

[ROCm][MoE configs] mi325 mixtral & mi300 qwen_moe (#13503 )

2025-02-18 22:23:24 -08:00

E=60,N=352,device_name=AMD_Instinct_MI300X.json

[ROCm][MoE configs] mi325 mixtral & mi300 qwen_moe (#13503 )

2025-02-18 22:23:24 -08:00

E=60,N=704,device_name=AMD_Instinct_MI300X.json

[ROCm][MoE configs] mi325 mixtral & mi300 qwen_moe (#13503 )

2025-02-18 22:23:24 -08:00

E=60,N=1408,device_name=AMD_Instinct_MI300X.json

[ROCm][MoE configs] mi325 mixtral & mi300 qwen_moe (#13503 )

2025-02-18 22:23:24 -08:00

E=62,N=128,device_name=AMD_Instinct_MI300X.json

[Kernel] MI-300X triton moe configs (#23445 )

2025-09-22 14:29:54 +00:00

E=62,N=256,device_name=AMD_Instinct_MI300X.json

[Kernel] MI-300X triton moe configs (#23445 )

2025-09-22 14:29:54 +00:00

E=62,N=256,device_name=NVIDIA_H100_80GB_HBM3.json

[Kernel] adding fused_moe configs for upcoming granite4 (#21332 )

2025-07-24 20:16:59 -07:00

E=62,N=512,device_name=AMD_Instinct_MI300X.json

[Kernel] MI-300X triton moe configs (#23445 )

2025-09-22 14:29:54 +00:00

E=62,N=512,device_name=NVIDIA_H100_80GB_HBM3.json

[Kernel] adding fused_moe configs for upcoming granite4 (#21332 )

2025-07-24 20:16:59 -07:00

E=64,N=320,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8.json

[Kernel] Add more tuned configs (#14877 )

2025-03-15 20:25:03 -07:00

E=64,N=320,device_name=NVIDIA_H100_80GB_HBM3.json

[Kernel] Add more tuned configs (#14877 )

2025-03-15 20:25:03 -07:00

E=64,N=320,device_name=NVIDIA_H200,dtype=fp8_w8a8.json

[Kernel] Add more tuned configs (#14877 )

2025-03-15 20:25:03 -07:00

E=64,N=320,device_name=NVIDIA_H200.json

[Kernel] Add more tuned configs (#14877 )

2025-03-15 20:25:03 -07:00

E=64,N=384,device_name=NVIDIA_H20,dtype=fp8_w8a8.json

[Model] Add ToolParser and MoE Config for Hunyuan A13B (#20820 )

2025-07-17 09:10:09 +00:00

E=64,N=384,device_name=NVIDIA_H20.json

[Model] Add ToolParser and MoE Config for Hunyuan A13B (#20820 )

2025-07-17 09:10:09 +00:00

E=64,N=512,device_name=NVIDIA_B200,dtype=fp8_w8a8,block_shape=[128,128].json

[PERF] Qwen3-next. Add fp8 cutlass MoE tuned configs. chmod -x *MI308X.json (#29553 )

2025-12-18 13:16:04 +08:00

E=64,N=640,device_name=NVIDIA_A100-SXM4-80GB.json

[Kernel] Tune Qwen2MoE kernel configurations with tp2,4 (#5497 )

2024-06-13 09:01:10 -07:00

E=64,N=640,device_name=NVIDIA_A800-SXM4-80GB.json

[Kernel] Add more tuned configs (#14877 )

2025-03-15 20:25:03 -07:00

E=64,N=640,device_name=NVIDIA_GeForce_RTX_4090,dtype=fp8_w8a8.json

[Kernel] Add more tuned configs (#14877 )

2025-03-15 20:25:03 -07:00

E=64,N=640,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8.json

[Kernel] Add more tuned configs (#14877 )

2025-03-15 20:25:03 -07:00

E=64,N=640,device_name=NVIDIA_H100_80GB_HBM3.json

[Kernel] Tune Qwen2MoE kernel configurations with tp2,4 (#5497 )

2024-06-13 09:01:10 -07:00

E=64,N=640,device_name=NVIDIA_H200,dtype=fp8_w8a8.json

[Kernel] Add more tuned configs (#14877 )

2025-03-15 20:25:03 -07:00

E=64,N=640,device_name=NVIDIA_H200.json

[Kernel] Add more tuned configs (#14877 )

2025-03-15 20:25:03 -07:00

E=64,N=768,device_name=NVIDIA_H20,dtype=fp8_w8a8.json

[Model] Add ToolParser and MoE Config for Hunyuan A13B (#20820 )

2025-07-17 09:10:09 +00:00

E=64,N=768,device_name=NVIDIA_H20.json

[Model] Add ToolParser and MoE Config for Hunyuan A13B (#20820 )

2025-07-17 09:10:09 +00:00

E=64,N=768,device_name=NVIDIA_H100_PCIe,dtype=fp8_w8a8,block_shape=[128,128].json

[Perf] Add H100 fused MoE config (#25398 )

2025-10-18 02:21:27 +00:00

E=64,N=896,device_name=NVIDIA_H20.json

[Misc] support model prefix & add deepseek vl2 tiny fused moe config (#17763 )

2025-05-08 07:50:22 +00:00

E=64,N=1280,device_name=NVIDIA_A100-SXM4-80GB.json

[Kernel] Tune Qwen2MoE kernel configurations with tp2,4 (#5497 )

2024-06-13 09:01:10 -07:00

E=64,N=1280,device_name=NVIDIA_A800-SXM4-80GB.json

[Kernel] Add more tuned configs (#14877 )

2025-03-15 20:25:03 -07:00

E=64,N=1280,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8.json

[Kernel] Add more tuned configs (#14877 )

2025-03-15 20:25:03 -07:00

E=64,N=1280,device_name=NVIDIA_H100_80GB_HBM3.json

[Kernel] Tune Qwen2MoE kernel configurations with tp2,4 (#5497 )

2024-06-13 09:01:10 -07:00

E=64,N=1280,device_name=NVIDIA_H200,dtype=fp8_w8a8.json

[Kernel] Add more tuned configs (#14877 )

2025-03-15 20:25:03 -07:00

E=64,N=1280,device_name=NVIDIA_H200.json

[Kernel] Add more tuned configs (#14877 )

2025-03-15 20:25:03 -07:00

E=64,N=1408,device_name=NVIDIA_B200.json

[Kernel][MoE] Add MoE tunings for GLM 4.6-FP8 and GLM 4.5 Air on NVidia B200 (#26818 )

2025-10-14 11:20:39 -07:00

E=64,N=1536,device_name=NVIDIA_H20,dtype=fp8_w8a8.json

[Model] Add ToolParser and MoE Config for Hunyuan A13B (#20820 )

2025-07-17 09:10:09 +00:00

E=64,N=2560,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8.json

[Kernel] Add more tuned configs (#14877 )

2025-03-15 20:25:03 -07:00

E=64,N=2560,device_name=NVIDIA_H200,dtype=fp8_w8a8.json

[Kernel] Add more tuned configs (#14877 )

2025-03-15 20:25:03 -07:00

E=64,N=2560,device_name=NVIDIA_H200.json

[Kernel] Add more tuned configs (#14877 )

2025-03-15 20:25:03 -07:00

E=64,N=3072,device_name=NVIDIA_H20,dtype=fp8_w8a8.json

[Model] Add ToolParser and MoE Config for Hunyuan A13B (#20820 )

2025-07-17 09:10:09 +00:00

E=64,N=3072,device_name=NVIDIA_H20.json

[Model] Add ToolParser and MoE Config for Hunyuan A13B (#20820 )

2025-07-17 09:10:09 +00:00

E=64,N=8960,device_name=NVIDIA_H100_80GB_HBM3,dtype=bf16.json

[Kernel][Model] Tune fused_moe Triton configs for Qwen3-30B A3/A3B on H100 (FP8/BF16) (#26268 )

2025-10-20 07:48:01 -07:00

E=64,N=8960,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8.json

[Kernel][Model] Tune fused_moe Triton configs for Qwen3-30B A3/A3B on H100 (FP8/BF16) (#26268 )

2025-10-20 07:48:01 -07:00

E=72,N=192,device_name=AMD_Instinct_MI300X.json

[Kernel] MI-300X triton moe configs (#23445 )

2025-09-22 14:29:54 +00:00

E=72,N=384,device_name=AMD_Instinct_MI300X.json

[Kernel] MI-300X triton moe configs (#23445 )

2025-09-22 14:29:54 +00:00

E=72,N=384,device_name=NVIDIA_H100_80GB_HBM3.json

[Kernel] adding fused_moe configs for upcoming granite4 (#21332 )

2025-07-24 20:16:59 -07:00

E=72,N=768,device_name=AMD_Instinct_MI300X.json

[Kernel] MI-300X triton moe configs (#23445 )

2025-09-22 14:29:54 +00:00

E=72,N=768,device_name=NVIDIA_H100_80GB_HBM3.json

[Kernel] adding fused_moe configs for upcoming granite4 (#21332 )

2025-07-24 20:16:59 -07:00

E=128,N=96,device_name=NVIDIA_H20.json

[Model] Add tuned triton fused_moe configs for Qwen3Moe (#17328 )

2025-04-28 15:20:24 -07:00

E=128,N=96,device_name=NVIDIA_H200,dtype=fp8_w8a8.json

[Model Performance] Add Qwen3MoE tuned MoE configs for H200 (#35457 )

2026-02-27 13:51:14 +08:00

E=128,N=96,device_name=NVIDIA_H200.json

[Model Performance] Add Qwen3MoE tuned MoE configs for H200 (#35457 )

2026-02-27 13:51:14 +08:00

E=128,N=192,device_name=NVIDIA_A100-SXM4-80GB.json

[Qwen3]add qwen3-235b-bf16 fused moe config on A100 (#17715 )

2025-05-07 23:09:32 -07:00

E=128,N=192,device_name=NVIDIA_H20-3e.json

Add H20-3e fused MoE kernel tuning configs for Qwen3-235B-A22B (#19315 )

2025-06-08 16:07:02 +08:00

E=128,N=192,device_name=NVIDIA_H20.json

[Model] Add tuned triton fused_moe configs for Qwen3Moe (#17328 )

2025-04-28 15:20:24 -07:00

E=128,N=192,device_name=NVIDIA_H100_80GB_HBM3.json

[Model] Add tuned triton fused_moe configs for Qwen3Moe (#17328 )

2025-04-28 15:20:24 -07:00

E=128,N=192,device_name=NVIDIA_H200.json

[Model] Add tuned triton fused_moe configs for Qwen3Moe (#17328 )

2025-04-28 15:20:24 -07:00

E=128,N=352,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8.json

Add glm4.5v tp2,4 fp8 config on H100_80GB (#23443 )

2025-08-23 02:54:19 +00:00

E=128,N=384,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json

[FEAT] [ROCm]: Add Qwen/Qwen3-235B-A22B-FP8 TP4 triton fused moe config (#17535 )

2025-05-01 06:37:17 -07:00

E=128,N=384,device_name=NVIDIA_B200,dtype=fp8_w8a8,block_shape=[128,128].json

QWEN3 Thinking Fused MoE kernels Optimization configs (#24330 )

2025-09-07 03:18:54 +00:00

E=128,N=384,device_name=NVIDIA_GB200,dtype=fp8_w8a8,block_shape=[128,128].json

QWEN3 Thinking Fused MoE kernels Optimization configs (#24330 )

2025-09-07 03:18:54 +00:00

E=128,N=384,device_name=NVIDIA_H20-3e,dtype=fp8_w8a8,block_shape=[128,128].json

Add H20-3e fused MoE kernel tuning configs for Qwen3-235B-A22B-FP8 (#19401 )

2025-06-11 07:23:57 +08:00

E=128,N=384,device_name=NVIDIA_H20-3e.json

Add H20-3e fused MoE kernel tuning configs for Qwen3-235B-A22B (#19315 )

2025-06-08 16:07:02 +08:00

E=128,N=384,device_name=NVIDIA_H20,dtype=fp8_w8a8,block_shape=[128,128].json

[Model] Add tuned triton fused_moe configs for Qwen3Moe (#17328 )

2025-04-28 15:20:24 -07:00

E=128,N=384,device_name=NVIDIA_H20.json

[Model] Add tuned triton fused_moe configs for Qwen3Moe (#17328 )

2025-04-28 15:20:24 -07:00

E=128,N=384,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json

[Model] Add tuned triton fused_moe configs for Qwen3Moe (#17328 )

2025-04-28 15:20:24 -07:00

E=128,N=384,device_name=NVIDIA_H200.json

[Model] Add tuned triton fused_moe configs for Qwen3Moe (#17328 )

2025-04-28 15:20:24 -07:00

E=128,N=512,device_name=NVIDIA_B200,dtype=fp8_w8a8,block_shape=[128,128].json

[PERF] Qwen3-next. Add fp8 cutlass MoE tuned configs. chmod -x *MI308X.json (#29553 )

2025-12-18 13:16:04 +08:00

E=128,N=512,device_name=NVIDIA_B200,dtype=fp8_w8a8.json

Add support for Mistral Large 3 inference with Flashinfer MoE (#33174 )

2026-01-30 22:48:27 -08:00

E=128,N=512,device_name=NVIDIA_B200.json

Add support for Mistral Large 3 inference with Flashinfer MoE (#33174 )

2026-01-30 22:48:27 -08:00

E=128,N=512,device_name=NVIDIA_GB200,dtype=fp8_w8a8.json

Add support for Mistral Large 3 inference with Flashinfer MoE (#33174 )

2026-01-30 22:48:27 -08:00

E=128,N=512,device_name=NVIDIA_H100_80GB_HBM3.json

[Model] Add tuned triton fused_moe configs for Qwen3Moe (#17328 )

2025-04-28 15:20:24 -07:00

E=128,N=512,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json

Add support for Mistral Large 3 inference with Flashinfer MoE (#33174 )

2026-01-30 22:48:27 -08:00

E=128,N=512,device_name=NVIDIA_H200,dtype=fp8_w8a8.json

Add Mistral Large 3 and Ministral 3 (#29757 )

2025-12-02 10:29:00 +00:00

E=128,N=512,device_name=NVIDIA_H200.json

Add support for Mistral Large 3 inference with Flashinfer MoE (#33174 )

2026-01-30 22:48:27 -08:00

E=128,N=704,device_name=NVIDIA_B200,dtype=fp8_w8a8.json

feat: add triton fused moe config for GLM-4.5-Air-FP8 on B200 (#23695 )

2025-08-26 18:06:10 -07:00

E=128,N=704,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8.json

Add glm4.5v tp2,4 fp8 config on H100_80GB (#23443 )

2025-08-23 02:54:19 +00:00

E=128,N=704,device_name=NVIDIA_RTX_PRO_6000_Blackwell_Workstation_Edition,dtype=fp8_w8a8.json

Add Fused MoE Triton kernels for GLM-4.5-Air, GLM-4.5v, GLM-4.6v on 2x RTX Pro 6000 (#31407 )

2025-12-28 08:38:33 -08:00

E=128,N=768,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json

[FEAT] [ROCm]: Add Qwen/Qwen3-30B-A3B-FP8 fused moe config for MI300X (#17530 )

2025-05-01 06:03:13 -07:00

E=128,N=768,device_name=AMD_Instinct_MI308X.json

[PERF] Qwen3-next. Add fp8 cutlass MoE tuned configs. chmod -x *MI308X.json (#29553 )

2025-12-18 13:16:04 +08:00

E=128,N=768,device_name=NVIDIA_B200,dtype=fp8_w8a8,block_shape=[128,128].json

QWEN3 Thinking Fused MoE kernels Optimization configs (#24330 )

2025-09-07 03:18:54 +00:00

E=128,N=768,device_name=NVIDIA_B200.json

[Model] Add tuned triton fused_moe configs for Qwen3Moe on B200 (#31448 )

2025-12-28 08:38:07 -08:00

E=128,N=768,device_name=NVIDIA_GB200,dtype=fp8_w8a8,block_shape=[128,128].json

QWEN3 Thinking Fused MoE kernels Optimization configs (#24330 )

2025-09-07 03:18:54 +00:00

E=128,N=768,device_name=NVIDIA_H20-3e,dtype=fp8_w8a8,block_shape=[128,128].json

Add H20-3e fused MoE kernel tuning configs for Qwen3-235B-A22B-FP8 (#19401 )

2025-06-11 07:23:57 +08:00

E=128,N=768,device_name=NVIDIA_H20,dtype=fp8_w8a8,block_shape=[128,128].json

[Model] Add tuned triton fused_moe configs for Qwen3Moe (#17328 )

2025-04-28 15:20:24 -07:00

E=128,N=768,device_name=NVIDIA_H20.json

[Model] Add tuned triton fused_moe configs for Qwen3Moe (#17328 )

2025-04-28 15:20:24 -07:00

E=128,N=768,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json

[Model] Add tuned triton fused_moe configs for Qwen3Moe (#17328 )

2025-04-28 15:20:24 -07:00

E=128,N=768,device_name=NVIDIA_H200.json

[Model] Add tuned triton fused_moe configs for Qwen3Moe (#17328 )

2025-04-28 15:20:24 -07:00

E=128,N=768,device_name=Radeon_8060S_Graphics,dtype=int4_w4a16.json

[Perf] fused_moe: add int4_w4a16 benchmark support and tuning config (#34130 )

2026-02-13 00:14:27 -08:00

E=128,N=928,device_name=NVIDIA_H100_80GB_HBM3.json

[Model] add optimal triton fused moe configs for NemotronH MoE (#27967 )

2025-11-04 12:59:43 +00:00

E=128,N=928,device_name=NVIDIA_L40S.json

[Model] add optimal triton fused moe configs for NemotronH MoE (#27967 )

2025-11-04 12:59:43 +00:00

E=128,N=1024,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8.json

[rocm][MI300] llama4 maverick fp8 moe config tp8 (#16847 )

2025-04-19 06:21:43 +00:00

E=128,N=1024,device_name=AMD_Instinct_MI300X.json

[Hardware][AMD] Improve OAM device ID + llama4 Maverick MOE tuning (#16263 )

2025-05-02 19:44:19 +00:00

E=128,N=1024,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8.json

[Kernel][Moe Configs] llama4 maverick fp8 moe config tp8 on mi325 (#28709 )

2025-11-15 01:10:48 -08:00

E=128,N=1024,device_name=NVIDIA_H100,dtype=fp8_w8a8.json

Add CUTLASS FP8 MOE benchmark scripts and kernel config (#25302 )

2025-09-23 18:07:42 -06:00

E=128,N=1024,device_name=NVIDIA_H200,dtype=fp8_w8a8.json

Added MoE configs for llama 4, H200 device with tp=4/8 tuning (#26837 )

2025-10-14 14:21:03 -07:00

E=128,N=1024,device_name=NVIDIA_H200.json

Added MoE configs for llama 4, H200 device with tp=4/8 tuning (#26837 )

2025-10-14 14:21:03 -07:00

E=128,N=1856,device_name=NVIDIA_B200.json

Add Triton fused MoE config for B200 (Nemotron Nano) (#32804 )

2026-01-29 19:21:33 +00:00

E=128,N=1856,device_name=NVIDIA_H100_80GB_HBM3.json

[Model] add optimal triton fused moe configs for NemotronH MoE (#27967 )

2025-11-04 12:59:43 +00:00

E=128,N=1856,device_name=NVIDIA_H200,dtype=fp8_w8a8.json

Adding Nemotron fp8 Triton MoE Config (#34674 )

2026-02-24 15:56:38 -08:00

E=128,N=1856,device_name=NVIDIA_L40S.json

[Model] add optimal triton fused moe configs for NemotronH MoE (#27967 )

2025-11-04 12:59:43 +00:00

E=128,N=8960,device_name=NVIDIA_H100_80GB_HBM3,dtype=bf16.json

[Kernel][Model] Tune fused_moe Triton configs for Qwen3-30B A3/A3B on H100 (FP8/BF16) (#26268 )

2025-10-20 07:48:01 -07:00

E=128,N=8960,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8.json

[Kernel][Model] Tune fused_moe Triton configs for Qwen3-30B A3/A3B on H100 (FP8/BF16) (#26268 )

2025-10-20 07:48:01 -07:00

E=129,N=704,device_name=NVIDIA_RTX_PRO_6000_Blackwell_Workstation_Edition,dtype=fp8_w8a8.json

Add Fused MoE Triton kernels for GLM-4.5-Air, GLM-4.5v, GLM-4.6v on 2x RTX Pro 6000 (#31407 )

2025-12-28 08:38:33 -08:00

E=160,N=192,device_name=AMD_Instinct_MI300X.json

[Hardware][AMD][Model] Triton MoE tuning configs for GLM-4.6 for MI300X (#27323 )

2025-10-27 22:58:06 -07:00

E=160,N=192,device_name=AMD_Instinct_MI350_OAM,dtype=fp8_w8a8.json

[Harware][AMD][Model] Triton MoE tuning configs for GLM-4.5 for MI350 and MI355 (#25586 )

2025-10-17 04:56:12 -07:00

E=160,N=192,device_name=NVIDIA_A800-SXM4-80GB.json

[Kernel] Add more tuned configs (#14877 )

2025-03-15 20:25:03 -07:00

E=160,N=192,device_name=NVIDIA_B300_SXM6_AC,dtype=fp8_w8a8.json

tuned fused configs for B300 (#30629 )

2025-12-18 11:41:59 -08:00

E=160,N=192,device_name=NVIDIA_H20-3e.json

Add H20-3e fused MoE kernel tuning configs for GLM-4.5 (#22433 )

2025-08-07 00:24:47 -07:00

E=160,N=192,device_name=NVIDIA_H200,dtype=fp8_w8a8.json

Add fused MoE config for H200 E160 N192 fp8 (#29182 )

2025-11-21 17:37:51 -08:00

E=160,N=320,device_name=NVIDIA_H20-3e.json

Add H20-3e fused MoE kernel tuning configs for Qwen3-Coder-480B-A35B-Instruct (#21598 )

2025-07-25 02:36:55 -07:00

E=160,N=384,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8.json

[Harware][AMD][Model] Triton MoE tuning configs for GLM-4.5 for MI300X (#25703 )

2025-09-26 01:18:20 -07:00

E=160,N=384,device_name=AMD_Instinct_MI350_OAM,dtype=fp8_w8a8.json

[Harware][AMD][Model] Triton MoE tuning configs for GLM-4.5 for MI350 and MI355 (#25586 )

2025-10-17 04:56:12 -07:00

E=160,N=384,device_name=AMD_Instinct_MI355_OAM,dtype=fp8_w8a8.json

[Harware][AMD][Model] Triton MoE tuning configs for GLM-4.5 for MI350 and MI355 (#25586 )

2025-10-17 04:56:12 -07:00

E=160,N=384,device_name=NVIDIA_B200,dtype=fp8_w8a8.json

glm 4.6 fused tuned inference config for B200 (#32958 )

2026-02-08 18:55:47 +00:00

E=160,N=384,device_name=NVIDIA_B300_SXM6_AC,dtype=fp8_w8a8.json

tuned fused configs for B300 (#30629 )

2025-12-18 11:41:59 -08:00

E=160,N=640,device_name=NVIDIA_B200,dtype=fp8_w8a8,block_shape=[128,128].json

QWEN3 Coder Fused MoE kernels Optimization configs (#24266 )

2025-09-04 20:33:43 +00:00

E=160,N=640,device_name=NVIDIA_GB200,dtype=fp8_w8a8,block_shape=[128,128].json

QWEN3 Coder Fused MoE kernels Optimization configs (#24266 )

2025-09-04 20:33:43 +00:00

E=160,N=640,device_name=NVIDIA_H100,dtype=fp8_w8a8,block_shape=[128,128].json

QWEN3 Coder Fused MoE kernels Optimization configs (#24266 )

2025-09-04 20:33:43 +00:00

E=160,N=768,device_name=NVIDIA_B300_SXM6_AC,dtype=fp8_w8a8.json

tuned fused configs for B300 (#30629 )

2025-12-18 11:41:59 -08:00

E=256,N=64,device_name=NVIDIA_A800-SXM4-80GB.json

[Kernel] Add more tuned configs (#14877 )

2025-03-15 20:25:03 -07:00

E=256,N=128,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json

[Kernel] Add more tuned configs (#14877 )

2025-03-15 20:25:03 -07:00

E=256,N=128,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a8.json

[Kernel] Add more tuned configs (#14877 )

2025-03-15 20:25:03 -07:00

E=256,N=128,device_name=NVIDIA_A800-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json

[Kernel] Add more tuned configs (#14877 )

2025-03-15 20:25:03 -07:00

E=256,N=128,device_name=NVIDIA_A800-SXM4-80GB,dtype=int8_w8a8.json

[Kernel] Add more tuned configs (#14877 )

2025-03-15 20:25:03 -07:00

E=256,N=128,device_name=NVIDIA_H20,dtype=fp8_w8a8,block_shape=[128,128].json

[Kernel] Add fp8_w8a8 fused MoE kernel tuning configs for DeepSeek V3/R1 on NVIDIA H20 (#16753 )

2025-04-18 00:01:30 +08:00

E=256,N=128,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json

[Misc][Easy] Remove the space from the file name

2025-02-05 19:23:35 -08:00

E=256,N=128,device_name=NVIDIA_L20Y,dtype=fp8_w8a8,block_shape=[128,128].json

[Kernel] Add more tuned configs (#14877 )

2025-03-15 20:25:03 -07:00

E=256,N=256,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json

[core] moe fp8 block quant tuning support (#14068 )

2025-03-04 01:30:23 +00:00

E=256,N=256,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json

MI325 configs, fused_moe_kernel bugfix (#14987 )

2025-03-18 08:05:18 -07:00

E=256,N=256,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json

MI325 configs, fused_moe_kernel bugfix (#14987 )

2025-03-18 08:05:18 -07:00

E=256,N=256,device_name=NVIDIA_B200,dtype=fp8_w8a8,block_shape=[128,128].json

[Kernel] Add more tuned configs (#14877 )

2025-03-15 20:25:03 -07:00

E=256,N=256,device_name=NVIDIA_H20-3e,dtype=fp8_w8a8,block_shape=[128,128].json

Add H20-3e fused MoE kernel tuning configs for DeepSeek-R1/V3 (#19205 )

2025-06-05 16:38:54 +00:00

E=256,N=256,device_name=NVIDIA_H20,dtype=fp8_w8a8,block_shape=[128,128].json

Add files via uploadAdd fused MoE kernel tuning configs (fp8_w8a8) for DeepSeek V3/R1 on a single-node 8x NVIDIA H20 96GB setup (#18337 )

2025-05-19 09:49:57 -07:00

E=256,N=256,device_name=NVIDIA_H20,dtype=int8_w8a8,block_shape=[128,128].json

[Kernel] Add more tuned configs (#14877 )

2025-03-15 20:25:03 -07:00

E=256,N=256,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json

[Kernels] Improve H200 Fused MoE Config (#28992 )

2025-11-19 19:23:54 +00:00

E=256,N=256,device_name=NVIDIA_L20,dtype=fp8_w8a8,block_shape=[128,128].json

[Misc] Add tuned R1 w8a8 and MoE configs for NVIDIA L20 (#15322 )

2025-03-23 01:10:10 -07:00

E=256,N=384,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json

[Kernel][Model] Tune fused_moe Triton configs for MiniMax-M2 on H100 (#28200 )

2025-11-06 07:29:46 -08:00

E=256,N=512,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json

MI325 configs, fused_moe_kernel bugfix (#14987 )

2025-03-18 08:05:18 -07:00

E=256,N=512,device_name=NVIDIA_B200,dtype=fp8_w8a8,block_shape=[128,128].json

[PERF] Qwen3-next. Add fp8 cutlass MoE tuned configs. chmod -x *MI308X.json (#29553 )

2025-12-18 13:16:04 +08:00

E=256,N=512,device_name=NVIDIA_H100_80GB_HBM3.json

[Qwen3-Next] MoE configs for H100 TP=1,2 and TP2/EP (#24739 )

2025-09-12 07:54:04 -07:00

E=256,N=1024,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json

MI325 configs, fused_moe_kernel bugfix (#14987 )

2025-03-18 08:05:18 -07:00

E=256,N=1024,device_name=AMD_Instinct_MI325X,block_shape=[128,128].json

MI325 configs, fused_moe_kernel bugfix (#14987 )

2025-03-18 08:05:18 -07:00

E=384,N=128,device_name=NVIDIA_B200,dtype=fp8_w8a8,block_shape=[128,128].json

Kimi K2 Fused MoE kernels Optimization configs (#24597 )

2025-09-10 23:06:16 -07:00

E=384,N=128,device_name=NVIDIA_GB200,dtype=fp8_w8a8,block_shape=[128,128].json

Kimi K2 Fused MoE kernels Optimization configs (#24597 )

2025-09-10 23:06:16 -07:00

E=384,N=128,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json

Kimi K2 Fused MoE kernels Optimization configs (#24597 )

2025-09-10 23:06:16 -07:00

E=384,N=256,device_name=NVIDIA_B200,dtype=fp8_w8a8,block_shape=[128,128].json

Kimi K2 Fused MoE kernels Optimization configs (#24597 )

2025-09-10 23:06:16 -07:00

E=384,N=256,device_name=NVIDIA_GB200,dtype=fp8_w8a8,block_shape=[128,128].json

Kimi K2 Fused MoE kernels Optimization configs (#24597 )

2025-09-10 23:06:16 -07:00

E=512,N=64,device_name=NVIDIA_A100-SXM4-80GB.json

[Qwen3-Next] MOE configs for A100-SXM4-80GB TP4 TP8 (#27740 )

2025-11-05 09:25:09 +08:00

E=512,N=64,device_name=NVIDIA_B200.json

[Qwen3-Next] Add B200 MoE configs for Qwen3-next (#24698 )

2025-09-11 15:34:58 -07:00

E=512,N=64,device_name=NVIDIA_H20-3e.json

[Qwen3-Next] MoE configs for H20 TP=1,2,4,8 (#24707 )

2025-09-12 10:06:26 +08:00

E=512,N=64,device_name=NVIDIA_H200.json

[Qwen3-Next] Add MoE Config for H200 (#24688 )

2025-09-11 12:40:15 -07:00

E=512,N=128,device_name=NVIDIA_A100-SXM4-80GB.json

[Qwen3-Next] MOE configs for A100-SXM4-80GB TP4 TP8 (#27740 )

2025-11-05 09:25:09 +08:00

E=512,N=128,device_name=NVIDIA_B200,dtype=fp8_w8a8,block_shape=[128,128].json

[PERF] Qwen3-next. Add fp8 cutlass MoE tuned configs. chmod -x *MI308X.json (#29553 )

2025-12-18 13:16:04 +08:00

E=512,N=128,device_name=NVIDIA_B200.json

[Qwen3-Next] Add B200 MoE configs for Qwen3-next (#24698 )

2025-09-11 15:34:58 -07:00

E=512,N=128,device_name=NVIDIA_GB200,dtype=fp8_w8a8.json

[QWEN NEXT] Fused MoE kernels Optimization configs (#24924 )

2025-09-16 13:06:03 +08:00

E=512,N=128,device_name=NVIDIA_H20-3e.json

[Qwen3-Next] MoE configs for H20 TP=1,2,4,8 (#24707 )

2025-09-12 10:06:26 +08:00

E=512,N=128,device_name=NVIDIA_H100_80GB_HBM3.json

[Qwen3-Next] MOE configs for H100 TP4 (#24699 )

2025-09-11 15:45:52 -07:00

E=512,N=128,device_name=NVIDIA_H200.json

[Qwen3-Next] MoE configs for H200 TP=1,2,4 (#24695 )

2025-09-11 14:38:16 -07:00

E=512,N=256,device_name=NVIDIA_B200,dtype=fp8_w8a8,block_shape=[128,128].json

[PERF] Qwen3-next. Add fp8 cutlass MoE tuned configs. chmod -x *MI308X.json (#29553 )

2025-12-18 13:16:04 +08:00

E=512,N=256,device_name=NVIDIA_B200.json

[Qwen3-Next] Add B200 MoE configs for Qwen3-next (#24698 )

2025-09-11 15:34:58 -07:00

E=512,N=256,device_name=NVIDIA_GB200,dtype=fp8_w8a8.json

[QWEN NEXT] Fused MoE kernels Optimization configs (#24924 )

2025-09-16 13:06:03 +08:00

E=512,N=256,device_name=NVIDIA_H20-3e.json

[Qwen3-Next] MoE configs for H20 TP=1,2,4,8 (#24707 )

2025-09-12 10:06:26 +08:00

E=512,N=256,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json

[Qwen3-Next] Add tuned MoE config for Qwen3-Next FP8 on H100 tp2 (#26887 )

2025-10-15 18:55:05 -07:00

E=512,N=256,device_name=NVIDIA_H100_80GB_HBM3.json

[Qwen3-Next] MoE configs for H100 TP=1,2 and TP2/EP (#24739 )

2025-09-12 07:54:04 -07:00

E=512,N=256,device_name=NVIDIA_H200.json

[Qwen3-Next] MoE configs for H200 TP=1,2,4 (#24695 )

2025-09-11 14:38:16 -07:00

E=512,N=512,device_name=NVIDIA_B200,dtype=fp8_w8a8,block_shape=[128,128].json

[PERF] Qwen3-next. Add fp8 cutlass MoE tuned configs. chmod -x *MI308X.json (#29553 )

2025-12-18 13:16:04 +08:00

E=512,N=512,device_name=NVIDIA_B200.json

[Qwen3-Next] Add B200 MoE configs for Qwen3-next (#24698 )

2025-09-11 15:34:58 -07:00

E=512,N=512,device_name=NVIDIA_GB200,dtype=fp8_w8a8.json

[QWEN NEXT] Fused MoE kernels Optimization configs (#24924 )

2025-09-16 13:06:03 +08:00

E=512,N=512,device_name=NVIDIA_H20-3e.json

[Qwen3-Next] MoE configs for H20 TP=1,2,4,8 (#24707 )

2025-09-12 10:06:26 +08:00

E=512,N=512,device_name=NVIDIA_H100_80GB_HBM3.json

[Qwen3-Next] MoE configs for H100 TP=1,2 and TP2/EP (#24739 )

2025-09-12 07:54:04 -07:00

E=512,N=512,device_name=NVIDIA_H200.json

[Qwen3-Next] MoE configs for H200 TP=1,2,4 (#24695 )

2025-09-11 14:38:16 -07:00

E=512,N=672,device_name=NVIDIA_B200.json

Add config file for fused MoE for Nemotron (TP4, B200) (#34411 )

2026-02-12 06:09:55 -08:00

E=512,N=1344,device_name=NVIDIA_B200.json

Add MoE config for Super B200 TP2 (#33510 )

2026-02-01 18:48:37 +00:00

README

[Model] Add tuned triton fused_moe configs for Qwen3Moe (#17328 )

2025-04-28 15:20:24 -07:00