nvfp4-megamoe-kernel

Files

biondizzle 3838561c19 fix: only suppress compile message, still warmup all layers

CuTeDSL caches kernels by (M, N, K) shape. Different layer shapes
(L1 vs L2, different expert counts) trigger new compiles. We can't
skip the warmup call — only suppress the print spam.

Flag now gates the message, not the warmup.

2026-05-16 05:18:10 +00:00

patches

fix: only suppress compile message, still warmup all layers

2026-05-16 05:18:10 +00:00

nvfp4_cutedsl.py

fix: L1 gate/up split — intermediate_size is per-projection, not fused

2026-05-16 04:04:40 +00:00