This website requires JavaScript.
Explore
Help
Register
Sign In
biondizzle
0 Followers
·
0 Following
Joined on
2025-12-10
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.
User to block:
Optional note:
The note is not visible to the blocked user.
Cancel
Block
Repositories
25
Projects
Packages
Public Activity
Starred Repositories
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-22 09:32:09 +00:00
c82c1ddc1b
test: add multiple seeds to verify softmax consistency
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-22 09:31:10 +00:00
a24b3e75a2
fix: use plain range loop for row_max (fmax not allowed in vectorized)
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-22 09:30:37 +00:00
c96454d70b
fix: add missing old_row_max = row_max before softmax max computation
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-22 09:29:46 +00:00
aa9c2d2308
fix vectorize issue: remove vectorize from exp2 pass, add row_sum accumulation
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-22 09:28:35 +00:00
f631ff16d6
fix: use cute.arch.fmax instead of if-else in vectorized loop
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-22 09:27:37 +00:00
941bcae8e1
softmax: element-wise row_max computation instead of .reduce()
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-22 09:24:00 +00:00
5e51b726ba
fix O normalization: use direct rmem tensor from partition_D shape
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-22 09:22:57 +00:00
0da960d8da
FMHA Stage-C: real softmax + O normalization in 6-warp layout
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-22 09:19:18 +00:00
6ebccf1e7e
fix: use make_smem_layout_epi not make_epilogue_smem_layout
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-22 09:18:57 +00:00
208af3eadd
FMHA v3 Stage-C full: 12-warp pipeline with real softmax + correction + epilogue
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-22 08:57:44 +00:00
b81ed1924b
more stuff
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-22 07:29:05 +00:00
7e1ba2b525
FMHA v3: per-row min test + explicit loop replacements
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-22 07:09:56 +00:00
791bdc53a0
FMHA v3: per-row patch from Mike + deadlock fix + V layout fix
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-22 05:52:15 +00:00
4761931c3e
FMHA v3: add debug variants for C9 normalization investigation
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-22 01:45:08 +00:00
201f11a339
Fix indexer score kernel: use static shared memory, correct FP4 head offsets
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-22 01:20:42 +00:00
6e06aed46c
Indexer: score+topk kernel, gather KV, compute_valid_lens
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-22 00:25:51 +00:00
8fcbc699a8
Flush compressor: schema fix, prepare_forward, flush_write kernels, state rotation
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-22 00:08:39 +00:00
23abfe9845
KV Cache: schema, allocator, pools, manager, append_swa kernel
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-21 23:32:06 +00:00
44582ec43b
Fix layer construction: match existing API signatures, add RMSNorm impl
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-21 23:26:52 +00:00
39c1592d9c
Clean up: remove debug/temp files and dangling test kernels
First
Previous
...
95
96
97
98
99
...
Next
Last