Files
nvfp4-megamoe-kernel/dsv4/model
biondizzle 4453d7475a Fix layer construction: match existing API signatures, add RMSNorm impl
- Nvfp4GroupedLinear: (n_local_groups, heads_per_group, head_dim, o_lora_rank)
- mHCLayer: hidden_dim, t_max_sinkhorn (not hidden_size, sinkhorn_iters)
- RMSNorm: PyTorch reference implementation (BF16, cudagraph-safe)
- Verified: all 43 Flash + 61 Pro layers construct cleanly
- All projection shapes validated against architecture spec
2026-05-21 23:31:58 +00:00
..