[MoE Refactor] Mxfp4 oracle rebased (#37128)

Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
Yongye Zhu
2026-03-20 22:37:04 -05:00
committed by GitHub
parent c7f98b4d0a
commit 87bd91892f
18 changed files with 1707 additions and 1381 deletions

View File

@@ -82,7 +82,7 @@ def test_mxfp4_loading_and_execution_moe(vllm_runner, model_case: ModelCase):
model_case.model_id,
tensor_parallel_size=model_case.tp,
load_format="dummy",
cudagraph_capture_sizes=[16],
compilation_config={"cudagraph_capture_sizes": [16]},
) as llm:
# Disabled as check_model is broken: https://github.com/vllm-project/vllm/pull/18465#issuecomment-3329880562
# def check_model(model):