Files
vllm/tests/evals/gsm8k/configs/moe-refactor/Mixtral-8x7B-Fp8-AutoFp8-fi-cutlass.yaml
Robert Shaw d3e477c013 [MoE Refactor] Add Temporary Integration Tests - H100/B200 (#31759)
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
2026-01-06 10:34:17 -05:00

10 lines
309 B
YAML

# TODO(rob): enable
# model_name: "amd/Mixtral-8x7B-Instruct-v0.1-FP8-KV"
# accuracy_threshold: 0.62
# num_questions: 1319
# num_fewshot: 5
# server_args: "--enforce-eager --max-model-len 8192 --tensor-parallel-size 2"
# env:
# VLLM_USE_FLASHINFER_MOE_FP8: "1"
# VLLM_FLASHINFER_MOE_BACKEND: "throughput"