nvfp4-megamoe-kernel/dsv4/ops/quantize.py at cb2ca8591fb87dce58b5cd4e1ff3a4e36963efdf

Files

biondizzle c2e3d15633 NVFP4-1.1 integration: GPU-only quantize kernel + MoE pipeline wiring

- Add quantize_nvfp4.cu: BF16→FP4 GPU kernel (no CPU sync, warp shuffle amax)
- Add quantize_nvfp4_gpu() bridge in ops/quantize.py
- Fix deinterleave_quantize kernel path (dsv4/ops/kernels → dsv4/kernels/cuda)
- Wire GPU quantize into Nvfp4MoE._run_impl():
  - L1 input: quantize_nvfp4_gpu (replaces quantize_activation_nvfp4)
  - Fused SwiGLU L2: deinterleave_quantize_nvfp4_cuda (single kernel)
  - Non-fused L2: quantize_nvfp4_gpu
- Add test_nvfp4_gpu_quantize.py for both kernels

2026-05-25 16:19:07 +00:00

11 KiB

Raw Blame History

View Raw

11 KiB Raw Blame History

11 KiB

Raw Blame History