This website requires JavaScript.
Explore
Help
Register
Sign In
biondizzle
0 Followers
·
0 Following
Joined on
2025-12-10
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.
User to block:
Optional note:
The note is not visible to the blocked user.
Cancel
Block
Repositories
25
Projects
Packages
Public Activity
Starred Repositories
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-24 23:37:12 +00:00
b5cd1b88c9
D2: add shape debug print for mQ/mK
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-24 23:35:51 +00:00
df3146eb53
D2: hardcode a_major=MN for multi-CTA (Q is always MN-major in FMHA)
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-24 23:34:43 +00:00
e809e71253
D2: use tensor indexing q[0] instead of local_tile for layout extraction
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-24 23:33:35 +00:00
49c4189195
D2: fix LayoutEnum for multi-dim Q (use head-0 view for layout)
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-24 23:30:10 +00:00
2b76b691cb
fix: block_idx() returns tuple, use [1] for y
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-24 23:27:40 +00:00
4c79e5533e
D2: add multi-CTA grid with block_idx_y for Q/O head indexing
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-24 22:58:24 +00:00
335e310c79
Update D2 status in README
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-24 22:57:51 +00:00
e3e67c3992
NVFP4-3: enable 2-CTA UMMA when MMA tile M >= 256 (1.7-1.9x throughput)
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-24 22:52:24 +00:00
e0339a92fc
D2: revert multi-CTA grid params (using per-head launch approach instead)
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-24 22:49:45 +00:00
a5271821a8
D2: add scale test (more heads, larger hd)
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-24 22:48:24 +00:00
d563c93fc5
D2: add per-head launch test
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-24 22:44:12 +00:00
9b476d87f9
fix: compare un-normalized O against un-normalized reference
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-24 22:41:33 +00:00
0ca7b58a6a
D1: fully revert LSE change back to original sfw_idx==0 guard
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-24 22:39:28 +00:00
db353ec35a
D2: add simple n_h=1 regression test
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-24 22:28:19 +00:00
4418e04a28
D1: revert per-row LSE to sfw_idx=0 for now (debugging D2 regression)
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-24 22:26:11 +00:00
2cc66bff68
D2: add initial multi-head test file
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-24 22:24:30 +00:00
49e66fb6e4
D1: corrected KV merge test with proper normalized output formula
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-24 22:23:09 +00:00
c47f648617
fix lse verify
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-24 22:22:34 +00:00
3577e09603
D1: add LSE verification test
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-24 22:21:53 +00:00
674c5b9c18
D1: fix per-row LSE output + add KV merge test v2 with per-row LSE
First
Previous
...
65
66
67
68
69
...
Next
Last