Files
nvfp4-megamoe-kernel/vllm/kernels/linear
biondizzle 98153002c0 Go back to torch.library.custom_op with correct GEMM impl
allow_in_graph doesn't work — Dynamo can't create proxies for Python
objects (the runner). The custom op approach requires only tensor args.

This time the GEMM impl correctly:
- Uses quantize_activation_nvfp4 for activation quantization
- Pads x_fp4 via uint8 + view(float4) for torch.zeros compat
- Assembles A-side scales with pad + swizzle
- Uses int32 expert_offsets (CuTeDSL requirement)
- Passes runner's pre-assembled mat_b, scale_b, gsb tensors
2026-05-19 01:24:41 +00:00
..