nvfp4-megamoe-kernel/dsv4/kernels/attention at 6bd3356582c4dd2f1b374c2014eaeeb2bfbdb87b - nvfp4-megamoe-kernel - Gitea: Git with a cup of tea

biondizzle/nvfp4-megamoe-kernel

Files

History

biondizzle 6bd3356582 fix: include cuda_bf16.h unconditionally, add --expt-relaxed-constexpr

2026-05-28 05:13:01 +00:00

..

__init__.py

Stage E: head-packed MQA/GQA, batch dim, custom_op, integration API

2026-05-27 15:15:03 +00:00

fmha_backup_pre_epilog.py

D1.5: Replace TMEM round-trip normalize with correction epilog (one-way: TMEM→reg→SMEM→GMEM)

2026-05-24 00:24:24 +00:00

fmha_backup_v2.py

D1.5: Prepare for SMEM accumulator implementation

2026-05-26 21:00:41 +00:00

fmha_sm100.cpp

fix: guard CUTLASS includes with __CUDA_ARCH__ for host compilation

2026-05-28 05:09:07 +00:00

fmha_sm100.cuh

fix: include cuda_bf16.h unconditionally, add --expt-relaxed-constexpr

2026-05-28 05:13:01 +00:00

fmha_smem_acc.py

Stage E: production attention wrapper + Python KV merge, clean fmha_smem_acc

2026-05-27 06:34:10 +00:00

fmha.py

Revert "D1.5: WIP SMEM accumulator — framework in place, accumulation logic TODO"

2026-05-27 02:17:26 +00:00

fmha.py.backup

auto: pre-test commit

2026-05-23 20:08:31 +00:00

production.py

Stage E: head-packed MQA/GQA, batch dim, custom_op, integration API

2026-05-27 15:15:03 +00:00