This website requires JavaScript.
Explore
Help
Register
Sign In
biondizzle
0 Followers
·
0 Following
Joined on
2025-12-10
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.
User to block:
Optional note:
The note is not visible to the blocked user.
Cancel
Block
Repositories
25
Projects
Packages
Public Activity
Starred Repositories
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-21 07:13:16 +00:00
f3a7dc1598
FOOTGUN
#0
: num_tma_load_bytes MUST include V bytes. Fix v27, v29, comment all. Update README.
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-21 07:08:53 +00:00
d873284e84
v29: FIX DEADLOCK - add V bytes to num_tma_load_bytes. V=I(128,128) cosine 1.0
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-21 06:46:16 +00:00
21c9a4823e
README: update with v28/v29 deadlock investigation, FMHA softmax bridge trace, new footguns
6ec308dd50
v29 (padded V, deadlocks), v30 (diag copy, works) — debugging epilogue deadlock with (128,128) PV
Compare 2 commits »
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-21 05:55:29 +00:00
261fe5e43e
even more stuff
3fdb9f008b
v28 attempt: PV MMA (128,64) - cosine 0.004, debugging
Compare 2 commits »
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-21 05:17:17 +00:00
ce63ec7ee9
README: Bug 4 root cause — TMEM layout mismatch (128,64) PV A-fragment vs softmax P write
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-21 05:09:04 +00:00
1f6fb856ea
more stuff
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-21 04:40:30 +00:00
1314a67135
Stage B progress: PV works for square (128,128), broken for (128,64)
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-21 04:10:20 +00:00
2cbb0369fd
Stage B: pipeline deadlock fixed, V MN-major applied, PV output garbage
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-21 00:12:50 +00:00
1eca10e39f
Stage B: C-fragment vs A-fragment TMEM layout mismatch diagnosed
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-20 20:26:31 +00:00
e678afcde0
Stage B: two MMAs + identity softmax — crash fixed, softmax output still wrong
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-20 07:15:05 +00:00
d3a7e7a286
stuff
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-20 06:51:15 +00:00
1b4742a438
Update README: reflect current state, add C128A/C4A topk + warmup fixes
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-20 06:48:12 +00:00
a793751fe8
Fix warmup compilation + add sparse topk metadata kernels
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-20 05:46:17 +00:00
dd7af0cd8a
feat: GPU-native SWA + sparse decode attention kernels (CuTeDSL)
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-20 04:42:59 +00:00
16b3094bdb
README: comprehensive update with current kernel status
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-20 04:40:00 +00:00
efa0a156a0
Update README with final kernel status
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-20 04:39:50 +00:00
fffb2144ae
Custom CUDA kernel for de-interleave plus NVFP4 quantize
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-20 04:20:49 +00:00
7fa81e6990
Remove debug print statements from pipeline
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-20 04:13:58 +00:00
d775d1075d
Fused SwiGLU epilogue with granularity-8 weight interleave
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-20 03:30:37 +00:00
f8716a1fa1
docs: rewrite README.md with current project state
First
Previous
...
98
99
100
101
102
...
Next
Last