nvfp4-megamoe-kernel

Files

biondizzle 1836e5fdc7 Add shared experts to post-quant BF16 dequant fix

Shared experts also use FlashInferCutlassNvFp4LinearKernel with
broken input_scale. They need the same BF16 dequant treatment.
gate_up_proj and down_proj on ffn.shared_experts.

2026-05-18 19:27:49 +00:00

patches

Add shared experts to post-quant BF16 dequant fix

2026-05-18 19:27:49 +00:00

nvfp4_cutedsl.py

HOTFIX: remove NaN checks from run() — torch.isnan().any() does CPU-GPU sync, breaks cudagraph

2026-05-17 22:28:32 +00:00