This website requires JavaScript.
Explore
Help
Register
Sign In
biondizzle
0 Followers
·
0 Following
Joined on
2025-12-10
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.
User to block:
Optional note:
The note is not visible to the blocked user.
Cancel
Block
Repositories
25
Projects
Packages
Public Activity
Starred Repositories
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-27 06:44:40 +00:00
e45b94c01b
Test: compare both normalized and un-normalized reference
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-27 06:43:01 +00:00
b70ab2a6ee
Return o_accum directly (un-normalized merge result)
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-27 06:41:45 +00:00
6111db571c
Match working test: don't pass row_sums to kernel
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-27 06:39:38 +00:00
312ac52d15
Normalize O_accum by exp(lse) before returning
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-27 06:38:05 +00:00
ddc701af9b
Use exact merge formula from working test_d1_kv_merge.py
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-27 06:36:26 +00:00
8321ccf9c1
Fix production KV merge: use normalized O for log-sum-exp merge
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-27 06:34:13 +00:00
98c93c1cd8
Stage E: production attention wrapper + Python KV merge, clean fmha_smem_acc
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-27 05:39:44 +00:00
51e456df44
Slice MMA tile coords from tOgO for TMA copy
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-27 05:38:41 +00:00
1caa737b09
Move sC_flat_staged creation before const_expr guard
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-27 05:37:08 +00:00
3c9dbc0c5d
Staged sC_flat with (128, pv_n_tile//2, 2) to match TMA atom
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-27 05:35:56 +00:00
de2028b106
Split sC_flat into staged layout to match TMA atom decomposition
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-27 05:34:41 +00:00
a0e9f7534b
Use tCgC_epi (transformed) for GMEM side of TMA partition
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-27 05:33:32 +00:00
b02e103ac0
Add c_simple GMEM tensor (non-dynamic) for SMEM accumulator TMA store
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-27 05:31:49 +00:00
2438826eee
Use tma_partition with group_modes on both sC_flat and gO
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-27 05:30:52 +00:00
603f52de78
Fix gO creation: use slice_(pv_mma_tiler) like fmha.py
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-27 05:29:52 +00:00
b39d7f1a14
Try cute.copy(tma_c, sC_flat, gO) directly
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-27 05:28:45 +00:00
2af767a90c
Try full tensor TMA copy without slicing
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-27 05:27:52 +00:00
7d14a2f764
sC_flat with simple (128, pv_n_tile) layout for full epi_tile coverage
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-27 05:26:51 +00:00
6fb0e6a417
Use sC_flat (non-swizzled epi_s layout) for TMA store from SMEM accumulator
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-27 05:25:36 +00:00
4a2a06f9e1
Fix gO slice: use separate Int32(0) instead of tuple
First
Previous
...
58
59
60
61
62
...
Next
Last