Files
nvfp4-megamoe-kernel/vllm
biondizzle 2fc81ccac4 Revert to BF16 dequant for attention NVFP4 (input_scale fix was too early)
process_weights_after_loading sets input_global_scale_inv AFTER
_convert_nvfp4_post_load runs, so the fix couldn't find the attrs.
Going back to BF16 dequant approach. The zeros in the dummy run are
expected (attention_impl returns early with out.zero_()). Need to test
with a real request under cudagraph_mode=NONE.
2026-05-18 16:23:41 +00:00
..