Enable modelopt gemma3 nvfp4/fp8, make workflow more robust (#22771)

Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
This commit is contained in:
Zhiyu
2025-09-19 15:40:33 -07:00
committed by GitHub
parent 711e912946
commit 431535b522
7 changed files with 82 additions and 22 deletions

View File

@@ -964,6 +964,9 @@ class ModelConfig:
"modelopt",
"modelopt_fp4",
"petit_nvfp4",
# Ensure heavy backends are probed last to avoid unnecessary
# imports during override detection (e.g., MXFP4 imports Triton)
"mxfp4",
]
quantization_methods = [
q for q in supported_quantization if q not in overrides