This website requires JavaScript.
Explore
Help
Register
Sign In
biondizzle
/
nvfp4-megamoe-kernel
Watch
1
Star
0
Fork
0
You've already forked nvfp4-megamoe-kernel
Code
Issues
Pull Requests
Actions
Packages
Projects
Releases
Wiki
Activity
Files
55def5eef9ebf016fc7dbe274fa5eb504a411148
nvfp4-megamoe-kernel
/
dsv4
/
ops
History
biondizzle
676fad064f
Fix: Add out= parameter to run_fused_swiglu_grouped_gemm signature
2026-06-03 21:45:15 +00:00
..
__init__.py
Restructure: cutedsl/ -> dsv4/ with proper layering
2026-05-21 17:30:44 +00:00
custom_ops.py
Stage E: head-packed MQA/GQA, batch dim, custom_op, integration API
2026-05-27 15:15:03 +00:00
gemm_runner.py
Fix: Add out= parameter to run_fused_swiglu_grouped_gemm signature
2026-06-03 21:45:15 +00:00
layouts.py
Restructure: cutedsl/ -> dsv4/ with proper layering
2026-05-21 17:30:44 +00:00
quantize.py
CUDA graph: Eliminate per-step allocations in graph-captured code paths
2026-06-03 21:30:24 +00:00
rope_cuda.py
fix: rope_cuda path — kernels/cuda not ops/cuda
2026-06-02 09:06:36 +00:00
router.py
router: catch CuTeDSL warmup failures fast, don't let MLIR errors slow down init
2026-06-01 00:00:07 +00:00