vllm/tests/evals/gsm8k/configs/moe-refactor/Mixtral-8x7B-BF16-fi-cutlass.yaml at 5719a4e4e601fb91274294d25370b7aad656d629 - vllm - Gitea: Git with a cup of tea

biondizzle/vllm

Files

Linda 275e0d2a99 [NVIDIA][test] Tests for flashinfer TRTLLM BF16 MoE (#33715 )

Signed-off-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com>
Co-authored-by: Pavani Majety <pmajety@nvidia.com>

2026-02-11 12:38:11 +00:00

9 lines

289 B

YAML

Raw Blame History

 model_name: "mistralai/Mixtral-8x7B-v0.1"
 accuracy_threshold: 0.58
 num_questions: 1319
 num_fewshot: 5
 server_args: "--enforce-eager --max-model-len 8192 --tensor-parallel-size 2 --enable-expert-parallel"
 env:
   VLLM_USE_FLASHINFER_MOE_FP16: "1"
   VLLM_FLASHINFER_MOE_BACKEND: "throughput"