nvfp4-megamoe-kernel/dsv4/kernels at 024be1a60bc118a8525962ea9c5bf18a4ecec0f5 - nvfp4-megamoe-kernel - Gitea: Git with a cup of tea

biondizzle/nvfp4-megamoe-kernel

Files

History

biondizzle 19afa52e80 fix: use cute.where() directly for clamp in fused SwiGLU

(silu_result > limit).float() doesn't work on TensorSSA.
cute.where(cond, true_val, false_val) is the correct TensorSSA API.

2026-06-02 08:16:41 +00:00

..

perf: skip MQA GQA expansion in FMHA (stride=0, no 128x K/V copy)

2026-06-02 03:54:03 +00:00

fix: correct gather.py kernel_dir path

2026-05-30 21:12:09 +00:00

fix: import torch.utils.cpp_extension explicitly in production_compress

2026-06-01 05:20:44 +00:00

perf: fused mHC Sinkhorn CUDA kernel (1 launch vs 38)

2026-06-02 03:50:57 +00:00

fix: use cute.where() directly for clamp in fused SwiGLU

2026-06-02 08:16:41 +00:00

P0 COMPLETE: Eliminate ALL .item() CPU-GPU syncs from NVFP4 activation path

2026-06-01 21:05:03 +00:00

Switch router to Nvfp4Linear production GEMM (custom CuTeDSL kernel crashes MLIR)

2026-06-01 11:17:54 +00:00

__init__.py

Restructure: cutedsl/ -> dsv4/ with proper layering

2026-05-21 17:30:44 +00:00