vllm/tests/evals/gsm8k/configs/moe-refactor/Llama-4-Scout-BF16-fi-cutlass.yaml at 5719a4e4e601fb91274294d25370b7aad656d629 - vllm - Gitea: Git with a cup of tea

biondizzle/vllm

Files

Linda 275e0d2a99 [NVIDIA][test] Tests for flashinfer TRTLLM BF16 MoE (#33715 )

Signed-off-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com>
Co-authored-by: Pavani Majety <pmajety@nvidia.com>

2026-02-11 12:38:11 +00:00

10 lines

304 B

YAML

Raw Blame History

 model_name: "meta-llama/Llama-4-Scout-17B-16E-Instruct"
 accuracy_threshold: 0.92
 num_questions: 1319
 num_fewshot: 5
 server_args: "--enforce-eager --max-model-len 8192 --tensor-parallel-size 2 --enable-expert-parallel"
 env:
   VLLM_USE_FLASHINFER_MOE_FP16: "1"
   VLLM_FLASHINFER_MOE_BACKEND: "throughput"