This website requires JavaScript.
Explore
Help
Register
Sign In
biondizzle
0 Followers
·
0 Following
Joined on
2025-12-10
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.
User to block:
Optional note:
The note is not visible to the blocked user.
Cancel
Block
Repositories
25
Projects
Packages
Public Activity
Starred Repositories
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-27 05:24:41 +00:00
bf36979a8d
Use CUTLASS FMHA reference pattern for sC->GMEM TMA store (flat_divide + tma_partition)
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-27 05:15:49 +00:00
97bc6d8d2f
Add c_direct GMEM tensor for direct writes in SMEM accumulator path
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-27 05:14:35 +00:00
3d349b497b
SME accumulator: direct GMEM write from sO_acc (bypass TMA for multi-kt)
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-27 05:13:39 +00:00
7d1e0a605d
Different coordinate dims for bSG_sC (2D) and bSG_gC (3D)
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-27 05:13:00 +00:00
75b272c5f2
2D coordinate for bSG_sC TMA copy
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-27 05:12:15 +00:00
72dff90165
3D coordinate for bSG_sC/gC TMA copy
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-27 05:11:27 +00:00
b8b6e8cc0b
Slice bSG_gC MMA tile coords for TMA copy
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-27 05:10:42 +00:00
754740d5e5
Try bSG_sC[(None, 0)] for TMA copy coordinate
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-27 05:09:57 +00:00
23a2b49daf
Add SMEM accumulator for n_kv_tiles>1: O load from TMEM, accumulate in sO_acc, TMA store from sC
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-27 05:06:54 +00:00
a858ed1c14
Fix test: normalize=False for un-normalized O comparison
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-27 05:05:43 +00:00
2e262d2b99
Reset fmha_smem_acc.py to working fmha.py base
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-27 05:05:03 +00:00
b43ffe9dac
Guard sO_acc allocation/zero-init with n_kv_tiles>1
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-27 05:02:52 +00:00
101840c78c
Guard SMEM accumulation with n_kv_tiles>1 to avoid TMEM destructive read
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-27 05:01:41 +00:00
02a34512cb
Use epilogue_tma_store for n_kv_tiles=1; TODO for multi-tile
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-27 05:00:40 +00:00
4652cab8b4
Fix: 3D coords for TMA copy (bSG_sC has 3 modes)
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-27 05:00:05 +00:00
b0ebf41ee3
Slice bSG_gC with mma_tile_coord (like epilogue_tma_store)
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-27 04:59:31 +00:00
eb0bf0cce0
Fix TMA store: use bSG_sC[(None,0)] indexing pattern from epilogue_tma_store
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-27 04:58:48 +00:00
7ea77a121f
Use cpasync.tma_partition for SMEM->GMEM TMA store (like epilogue_tma_store)
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-27 04:57:41 +00:00
e614d0894c
Clean up SMEM acc epilogue: flat indexing sO_acc->sC, TMA store from sC_s0
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-27 04:55:20 +00:00
1724eeb8ec
Fix TMA store: use epi_s view of sC for proper layout compatibility
First
Previous
...
59
60
61
62
63
...
Next
Last