This website requires JavaScript.
Explore
Help
Register
Sign In
biondizzle
0 Followers
·
0 Following
Joined on
2025-12-10
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.
User to block:
Optional note:
The note is not visible to the blocked user.
Cancel
Block
Repositories
25
Projects
Packages
Public Activity
Starred Repositories
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-28 14:44:11 +00:00
6f5be8a4e4
Debug: print P values
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-28 14:43:03 +00:00
3d15f5bb21
Debug: 1 PV K-tile
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-28 14:42:22 +00:00
284a06ddf1
FMHA v5: clean rewrite with QK + softmax + PV SS per K-tile
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-28 14:40:57 +00:00
342193e0b4
Fix tb scope
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-28 14:40:36 +00:00
a6f7ef7c45
Add softmax read from TMEM
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-28 14:40:02 +00:00
38b0ff0bf8
Add QK GEMM to minimal PV test
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-28 14:39:00 +00:00
e9f8f9e6e3
Minimal PV with s_p_vals in SMEM
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-28 14:38:04 +00:00
97ebb964a2
Move s_p_vals to dynamic SMEM
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-28 14:37:13 +00:00
d2387dd858
Full FMHA v4: per-K-tile P fill into reusable (128,16) buffer
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-28 14:35:30 +00:00
78b470317f
PV accumulation debug with detailed TMEM read
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-28 14:33:33 +00:00
dacbf53081
Test K-tiles 0-1 accumulated
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-28 14:32:54 +00:00
bad31d9476
Test K-tile 1
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-28 14:32:13 +00:00
9198ed734f
Test 1 PV K-tile from (128,128) P at offset 0
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-28 14:31:23 +00:00
ce88cd6e9e
Zero TMEM manually, all K-tiles accumulate=true
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-28 14:30:10 +00:00
727c509454
PV SS MMA with 8 K-tile accumulation
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-28 14:29:15 +00:00
d5b0941f2e
PV SS MMA with (128,128) P layout
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-28 14:28:26 +00:00
f94693fdc2
Fix: add back cudaDeviceSynchronize
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-28 14:28:03 +00:00
fb8af865f4
Check launch error
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-28 14:27:14 +00:00
738e39cb63
Debug: add printf at kernel start
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-28 14:26:38 +00:00
9e13096bf8
Debug: skip QK, write P directly to SMEM, 1 PV K-tile
First
Previous
...
43
44
45
46
47
...
Next
Last