Files

Andreas Karatzas 2df2c85be4 [Kernels][MoE] Fix legacy_routing to use bitmatrix-based routing path (#38504 )

Signed-off-by: Andreas Karatzas <akaratza@amd.com>

2026-04-07 10:57:09 +08:00

configs

[Kernels][MoE] Fix legacy_routing to use bitmatrix-based routing path (#38504 )

2026-04-07 10:57:09 +08:00

__init__.py

Convert formatting to use ruff instead of yapf + isort (#26247 )

2025-10-05 07:06:22 -07:00

conftest.py

[MoE Refactor] MXFP4 Cutlass Experts to MK (#34542 )

2026-02-25 17:32:39 -08:00

README.md

[MoE Refactor] MXFP4 Cutlass Experts to MK (#34542 )

2026-02-25 17:32:39 -08:00

test_gpqa_correctness.py

[MoE Refactor] MXFP4 Cutlass Experts to MK (#34542 )

2026-02-25 17:32:39 -08:00

README.md

GPQA Evaluation using GPT-OSS

This directory contains GPQA evaluation tests using the GPT-OSS evaluation package and vLLM server.

Usage

Run tests with pytest (like buildkite)

# H200
pytest -s -v tests/evals/gpt_oss/test_gpqa_correctness.py \
    --config-list-file=configs/models-h200.txt

# B200
pytest -s -v tests/evals/gpt_oss/test_gpqa_correctness.py \
    --config-list-file=configs/models-b200.txt

Configuration Format

Model configs in configs/ directory use this YAML format:

model_name: "openai/gpt-oss-20b"
metric_threshold: 0.568          # Minimum expected accuracy
reasoning_effort: "low"          # Reasoning effort level (default: "low")
server_args: "--tensor-parallel-size 2"  # Server arguments
startup_max_wait_seconds: 1800   # Max wait for server startup (default: 1800)
env:                             # Environment variables (optional)
  SOME_VAR: "value"

The server_args field accepts any arguments that can be passed to vllm serve.

The env field accepts a dictionary of environment variables to set for the server process.

Adding New Models

Create a new YAML config file in the configs/ directory
Add the filename to the appropriate models-*.txt file

Tiktoken Encoding Files

The tiktoken encoding files required by the vLLM server are automatically downloaded from OpenAI's public blob storage on first run:

cl100k_base.tiktoken
o200k_base.tiktoken

Files are cached in the data/ directory. The TIKTOKEN_ENCODINGS_BASE environment variable is automatically set to point to this directory when running evaluations.