This website requires JavaScript.
Explore
Help
Register
Sign In
biondizzle
0 Followers
·
0 Following
Joined on
2025-12-10
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.
User to block:
Optional note:
The note is not visible to the blocked user.
Cancel
Block
Repositories
25
Projects
Packages
Public Activity
Starred Repositories
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-29 22:46:23 +00:00
74145a31cc
feat: V TMA loads in multi-tile kernel
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-29 22:42:48 +00:00
680d2ebf64
test: V TMA diagnostic — isolate V TMA descriptor issue
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-29 20:02:01 +00:00
077fbdf3c5
test: HD=128/256 multi-tile variants
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-29 19:59:24 +00:00
7df17384fd
test: multi-tile s_k=128/256/384/512
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-29 19:57:57 +00:00
d47b2bfcce
fix: use un-normalized P for multi-tile PV (correct online softmax merge)
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-29 19:56:37 +00:00
43ae3e7f98
fix: reload Q per-K-sub-tile in multi-tile kernel (same as single-tile)
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-29 19:53:04 +00:00
7598d548ee
debug: test multi-tile with s_k=128 only
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-29 19:49:55 +00:00
8e99bd50e6
feat: 6-warp TMA multi-tile KV kernel with register accumulator + test
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-29 19:47:50 +00:00
1814510195
wip: add n_kv_tiles param for multi-tile KV (not yet used)
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-29 19:45:46 +00:00
d20792aa9d
fix: TMA descriptor index for batched multi-head (batch*n_h + head)
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-29 19:44:59 +00:00
754c6a692c
feat: per-head TMA descriptors for multi-head FMHA
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-29 19:43:42 +00:00
9eb193458e
test: refactored multi-row TMA test with multi-head and batch
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-29 19:41:43 +00:00
832a04181d
test: relax relative error threshold to 5% for BF16, use cosine > 0.999 as pass criterion
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-29 19:40:34 +00:00
bfef94f5d0
test: HD=128/256 multi-row TMA FMHA
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-29 19:39:19 +00:00
a1b2ab79a1
feat: 6-warp TMA FMHA multi-row kernel + test
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-29 19:36:43 +00:00
d0a50f1f2e
fix: remove double normalization in TMA epilogue (P already normalized before PV)
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-29 19:35:47 +00:00
fb971781aa
fix: revert V to direct load (V TMA needs debugging), K TMA works
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-29 19:34:49 +00:00
cd2c028b39
feat: TMA loads for both K and V in 6-warp FMHA kernel
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-29 19:32:52 +00:00
523d3838a2
test: HD=128/256 variants for TMA FMHA
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-29 19:32:19 +00:00
bd4f09d514
fix: ambiguous MMA_K_BF16 in test
First
Previous
...
34
35
36
37
38
...
Next
Last