This website requires JavaScript.
Explore
Help
Register
Sign In
biondizzle
0 Followers
·
0 Following
Joined on
2025-12-10
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.
User to block:
Optional note:
The note is not visible to the blocked user.
Cancel
Block
Repositories
25
Projects
Packages
Public Activity
Starred Repositories
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-28 23:01:14 +00:00
d54bce6a6d
fix: correct SMEM size for MMA 4-warp test
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-28 23:00:29 +00:00
be45e87891
test: MMA→4-warp TMEM read — do warps see different rows?
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-28 22:59:33 +00:00
6b0d57074a
test: TMEM cross-warp visibility with different sync strategies
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-28 22:58:50 +00:00
77d190278e
test: simpler TMEM 4-warp read — direct store+load
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-28 22:58:01 +00:00
91b03bd6bd
test: verify 4-warp TMEM read with 32x32b.x8 after MMA
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-28 22:56:30 +00:00
28e04a5ea8
fix: use __cvta_generic_to_shared directly for 64-bit compat
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-28 22:56:17 +00:00
1d6a95df32
fix: typo in tmem row offset test
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-28 22:56:01 +00:00
cf6fe71368
test: verify TMEM 32x32b.x8 row offset addressing
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-28 22:53:51 +00:00
4cfb707405
fix: correct SMEM size calculation in multirow test
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-28 22:52:53 +00:00
863a030c3b
fmha_multirow: rewrite with 32x32b.x8 only, no s_p_vals, row_page addressing
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-28 21:08:17 +00:00
1ba304db3e
stuff
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-28 20:13:54 +00:00
deaa3ec725
CRITICAL FIX: Q/K SMEM canonical layout must use local d (0..15) not full_d — UMMA descriptor reads from sQ0/sK0 start, not offset
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-28 20:10:18 +00:00
08694b8136
Fix multi-row softmax v3: 32x32b.x8 with per-lane per-row (no wmax/wsum), per-row sRowMax/sRowSum arrays
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-28 20:08:32 +00:00
aaa76c1af1
Rewrite multi-row softmax using 16x256b.x1 TMEM reads for proper multi-row access
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-28 20:06:17 +00:00
5e3c61184c
Fix multi-row softmax: remove cross-lane wmax/wsum — each lane handles its own row independently
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-28 20:05:02 +00:00
bf4dfd131b
Fix nvcc goto-bypasses-init: move var decls before goto targets
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-28 20:04:46 +00:00
2b09d4f2ef
Fix nvcc goto-bypasses-init in multi-row test
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-28 20:04:31 +00:00
d8b421ccee
Multi-row FMHA kernel (Milestone 4): T>1 prefill support with 4-warp parallel softmax
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-28 19:35:07 +00:00
adc88613fa
Milestone 5 COMPLETE: multi-head FMHA grid launch verified on B200
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-28 19:33:06 +00:00
3fd302e7a0
Fix nvcc goto-bypasses-init errors in multi-head test
First
Previous
...
39
40
41
42
43
...
Next
Last