This website requires JavaScript.
Explore
Help
Register
Sign In
biondizzle
/
nvfp4-megamoe-kernel
Watch
1
Star
0
Fork
0
You've already forked nvfp4-megamoe-kernel
Code
Issues
Pull Requests
Actions
Packages
Projects
Releases
Wiki
Activity
Files
107d62dd76215946fcdda9240f72b6c0c1dbb708
nvfp4-megamoe-kernel
/
dsv4
/
kernels
/
attention
History
biondizzle
ca53bdb8e1
perf: skip MQA GQA expansion in FMHA (stride=0, no 128x K/V copy)
2026-06-02 03:54:03 +00:00
..
__init__.py
attention/: Clean up folder, archive backups, add detailed status headers
2026-05-28 07:01:33 +00:00
fmha_6warp_tma_multirow_multitile.cuh
FMHA sink: don't double-scale sink bias
2026-05-31 23:12:20 +00:00
fmha_common.cuh
docs: update here-docs with CuTeDSL rationale for NVIDIA
2026-05-28 07:54:01 +00:00
fmha_multitile_capi.cu
FMHA sink bias in kernel + single_shot production rewrite
2026-05-31 23:10:13 +00:00
fmha_multitile_op.py
perf: skip MQA GQA expansion in FMHA (stride=0, no 128x K/V copy)
2026-06-02 03:54:03 +00:00
fmha_tma.cuh
P6: TMA store uses mbarrier completion (same as load)
2026-05-30 17:07:24 +00:00
fmha_umma_desc.cuh
fix: warp-stride for TMA canonical writes — only load warp calls them
2026-05-29 18:25:47 +00:00
production.py
perf: skip MQA GQA expansion in FMHA (stride=0, no 128x K/V copy)
2026-06-02 03:54:03 +00:00