nvfp4-megamoe-kernel

Files

biondizzle 0d1cd1e216 P4: Add QuantizedActivation + Nvfp4Linear.run_from_quantized

- QuantizedActivation: carries (x_fp4, x_sf, gsa) for skip-quantize path
- Nvfp4Linear.run_from_quantized(): runs GEMM with pre-quantized input
- Enables fused RMSNorm+quantize to feed directly into all downstream
  linears (q_a, kv, o_proj, etc.) without re-quantizing

2026-06-02 16:37:38 +00:00

cache

E1: Wire LayerCacheHandle gather methods + CUDA gather kernels

2026-05-30 21:09:21 +00:00

kernels

P4: Fix fused RMSNorm kernel — match quantize_nvfp4.cu encoding

2026-06-02 16:28:44 +00:00

layers

P4: Add QuantizedActivation + Nvfp4Linear.run_from_quantized

2026-06-02 16:37:38 +00:00

loader

Restructure: cutedsl/ -> dsv4/ with proper layering