Quantizes only MoE expert weights to NVFP4, leaving attention untouched. Includes comments documenting all available NVFP4 strategies. Copy to model_opt_nvfp4_<strategy>.py for each new strategy.
Quantizes only MoE expert weights to NVFP4, leaving attention untouched. Includes comments documenting all available NVFP4 strategies. Copy to model_opt_nvfp4_<strategy>.py for each new strategy.