Add support for ModelOpt MXFP8 dense models (#33786)

Signed-off-by: Daniel Serebrenik <daserebrenik@nvidia.com>
2026-02-08 21:16:48 +02:00
parent 1ecfabe525
commit 084aa19f02
6 changed files with 375 additions and 14 deletions
--- a/docs/features/quantization/modelopt.md
+++ b/docs/features/quantization/modelopt.md
@@ -17,6 +17,7 @@ following `quantization.quant_algo` values:
 - `FP8_PER_CHANNEL_PER_TOKEN`: per-channel weight scale and dynamic per-token activation quantization.
 - `FP8_PB_WO` (ModelOpt may emit `fp8_pb_wo`): block-scaled FP8 weight-only (typically 128×128 blocks).
 - `NVFP4`: ModelOpt NVFP4 checkpoints (use `quantization="modelopt_fp4"`).
+- `MXFP8`: ModelOpt MXFP8 checkpoints (use `quantization="modelopt_mxfp8"`).

 ## Quantizing HuggingFace Models with PTQ