DeepGEMM

Files

biondizzle bbf9a5f46a feat: fold weight_scale_2 into block scales in NVFP4 transform

- transform_nvfp4_weights_for_mega_moe now accepts weight_scale_2
- Folds global scale into block scales: UE4M3 * FP32 -> UE4M3
- Dequantize to f32, multiply by global scale, clamp [0,448], re-quantize
- This is needed because the kernel only applies one level of block scaling

2026-05-11 05:42:16 +00:00

include/deep_gemm

feat: NVFP4 mega MoE kernel (scale_vec::4X, UE4M3 block scales)

2026-05-11 05:41:08 +00:00

legacy

[Public release 26/04] Introducing Mega MoE, FP4 Indexer and other features/fixes (#304 )

2026-04-17 09:45:14 +08:00

mega

feat: fold weight_scale_2 into block scales in NVFP4 transform

2026-05-11 05:42:16 +00:00

testing

[Public release 26/04] Introducing Mega MoE, FP4 Indexer and other features/fixes (#304 )

2026-04-17 09:45:14 +08:00

utils

[Public release 26/04] Introducing Mega MoE, FP4 Indexer and other features/fixes (#304 )

2026-04-17 09:45:14 +08:00

__init__.py

Add various optimizations and Mega MoE benchmarks (#316 )

2026-04-24 18:41:37 +08:00