DeepGEMM

Files

biondizzle bbf9a5f46a feat: fold weight_scale_2 into block scales in NVFP4 transform

- transform_nvfp4_weights_for_mega_moe now accepts weight_scale_2
- Folds global scale into block scales: UE4M3 * FP32 -> UE4M3
- Dequantize to f32, multiply by global scale, clamp [0,448], re-quantize
- This is needed because the kernel only applies one level of block scaling

2026-05-11 05:42:16 +00:00

__init__.py

feat: fold weight_scale_2 into block scales in NVFP4 transform

2026-05-11 05:42:16 +00:00