Files
vllm/vllm/model_executor/layers/fused_moe/configs
Amanzhol Salykov 8a24842765 [ROCm] add tuned moe_wna16_triton kernel configs for CDNA4 (#35093)
Signed-off-by: salykova <amsalykov@gmail.com>
Signed-off-by: amd-asalykov <asalykov@amd.com>
2026-03-11 19:00:08 +00:00
..

This directory contains tuned configurations for different settings of the fused_moe kernel.
For different settings of
- E (number of experts)
- N (intermediate size)
- device_name (torch.cuda.get_device_name())
the JSON file contains a mapping from M (batch size) to the chosen configuration.

The example configurations provided are for the Mixtral model for TP2 on H100
and TP4 on A100. Mixtral has intermediate size N = 14336, i.e. for TP2 we have
N = 7168 and for TP4 we have N = 3584.

See `benchmark/kernels/benchmark_moe.py` on how to generate these config files.