nvfp4-megamoe-kernel

Files

biondizzle 2fc81ccac4 Revert to BF16 dequant for attention NVFP4 (input_scale fix was too early)

process_weights_after_loading sets input_global_scale_inv AFTER
_convert_nvfp4_post_load runs, so the fix couldn't find the attrs.
Going back to BF16 dequant approach. The zeros in the dummy run are
expected (attention_impl returns early with out.zero_()). Need to test
with a real request under cudagraph_mode=NONE.

2026-05-18 16:23:41 +00:00

patches

Revert to BF16 dequant for attention NVFP4 (input_scale fix was too early)

2026-05-18 16:23:41 +00:00

nvfp4_cutedsl.py

HOTFIX: remove NaN checks from run() — torch.isnan().any() does CPU-GPU sync, breaks cudagraph

2026-05-17 22:28:32 +00:00