nvfp4-megamoe-kernel

Files

biondizzle f0c1be3ced fix: remove broken hc_head warmup (wrong tensor shape)

hc_head_fuse_tilelang expects fn shape[0]=hc_mult (4) but we passed
hc_mult*(2+hc_mult) (24). Since --enforce-eager disables @torch.compile
anyway, hc_head runs eagerly and doesn't need warmup.

2026-05-16 10:11:34 +00:00

deepseek_v4_attention.py

feat: CUTLASS NVFP4 mega_moe kernel — slot-based L1/L2, source-first SF remap

2026-05-15 11:38:18 +00:00

deepseek_v4.py

fix: remove broken hc_head warmup (wrong tensor shape)

2026-05-16 10:11:34 +00:00