nvfp4-megamoe-kernel

Files

biondizzle c2e3d15633 NVFP4-1.1 integration: GPU-only quantize kernel + MoE pipeline wiring

- Add quantize_nvfp4.cu: BF16→FP4 GPU kernel (no CPU sync, warp shuffle amax)
- Add quantize_nvfp4_gpu() bridge in ops/quantize.py
- Fix deinterleave_quantize kernel path (dsv4/ops/kernels → dsv4/kernels/cuda)
- Wire GPU quantize into Nvfp4MoE._run_impl():
  - L1 input: quantize_nvfp4_gpu (replaces quantize_activation_nvfp4)
  - Fused SwiGLU L2: deinterleave_quantize_nvfp4_cuda (single kernel)
  - Non-fused L2: quantize_nvfp4_gpu
- Add test_nvfp4_gpu_quantize.py for both kernels

2026-05-25 16:19:07 +00:00

__init__.py

Restructure: cutedsl/ -> dsv4/ with proper layering

2026-05-21 17:30:44 +00:00

attention.py

Fix layer construction: match existing API signatures, add RMSNorm impl

2026-05-21 23:31:58 +00:00

embedding.py

Restructure: cutedsl/ -> dsv4/ with proper layering

2026-05-21 17:30:44 +00:00

ffn.py

Layer dispatch: config, schedule, attention/FFN sub-blocks, TransformerLayer