biondizzle
c2e3d15633
NVFP4-1.1 integration: GPU-only quantize kernel + MoE pipeline wiring
- Add quantize_nvfp4.cu: BF16→FP4 GPU kernel (no CPU sync, warp shuffle amax)
- Add quantize_nvfp4_gpu() bridge in ops/quantize.py
- Fix deinterleave_quantize kernel path (dsv4/ops/kernels → dsv4/kernels/cuda)
- Wire GPU quantize into Nvfp4MoE._run_impl():
- L1 input: quantize_nvfp4_gpu (replaces quantize_activation_nvfp4)
- Fused SwiGLU L2: deinterleave_quantize_nvfp4_cuda (single kernel)
- Non-fused L2: quantize_nvfp4_gpu
- Add test_nvfp4_gpu_quantize.py for both kernels
2026-05-25 16:19:07 +00:00
..
2026-05-21 17:30:44 +00:00
2026-05-21 23:31:58 +00:00
2026-05-21 17:30:44 +00:00
2026-05-21 23:11:09 +00:00
2026-05-21 17:30:44 +00:00
2026-05-21 17:30:44 +00:00
2026-05-21 17:30:44 +00:00
2026-05-25 16:19:07 +00:00
2026-05-21 23:31:58 +00:00
2026-05-21 21:54:05 +00:00
2026-05-21 17:30:44 +00:00