nvfp4-megamoe-kernel

Files

biondizzle f951d284e7 test: add CuTeDSL NVFP4 GEMM test using reference ScaledGroupedGemmKernel

Tests the NVIDIA reference kernel with our quantization pipeline:
1. Quantize BF16 → NVFP4 (our stage_activation logic)
2. Pad and swizzle scale factors (to_blocked)
3. Run ScaledGroupedGemmKernel (2Dx3D scenario)
4. Compare against BF16 matmul reference

Also adds cutedsl/moe.py module for the future pipeline integration.

2026-05-16 02:55:04 +00:00

nvfp4_megamoe_kernel

test: add CuTeDSL NVFP4 GEMM test using reference ScaledGroupedGemmKernel

2026-05-16 02:55:04 +00:00

nvfp4_megamoe_kernel.egg-info

feat: CUTLASS NVFP4 mega_moe kernel — slot-based L1/L2, source-first SF remap

2026-05-15 11:38:18 +00:00