This website requires JavaScript.
Explore
Help
Register
Sign In
biondizzle
0 Followers
·
0 Following
Joined on
2025-12-10
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.
User to block:
Optional note:
The note is not visible to the blocked user.
Cancel
Block
Repositories
25
Projects
Packages
Public Activity
Starred Repositories
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-30 08:57:01 +00:00
c55030a340
P5: clean kernel with runtime branch (single-tile unchanged, multi-tile separate path)
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-30 08:53:52 +00:00
5f4856d771
P5: fix sOacc init race — use single thread (tid==0) instead of 4 softmax warps
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-30 08:50:48 +00:00
66b126ded8
P5: fix standalone test template — add n_kv_tiles to FmhaParams
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-30 08:47:49 +00:00
0f34f60494
P5: fix single-tile backward compat (normalized P for n_kv_tiles==1)
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-30 08:46:10 +00:00
2649488d13
P5: in-kernel multi-KV-tile FA2 online softmax in fmha_6warp_multihead.cuh
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-30 08:42:21 +00:00
6421f7c3f3
P4 RESOLVED: TMA hang was GMEM misalignment, not descriptor/driver issue
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-30 08:41:21 +00:00
58c087416b
P4: 128B-aligned GMEM, proper SMEM alignment, bit21 test
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-30 08:40:23 +00:00
90c806733f
P4: test TMA with bit-21 workaround and innermost-first dims
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-30 08:38:57 +00:00
16027018df
P4: fix TMA load test (32-bit SMEM addrs, proper mbarrier)
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-30 08:38:12 +00:00
e2ecdc42d8
P4: TMA load test kernel (swizzle vs no-swizzle hang diagnosis)
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-30 08:37:07 +00:00
bd104c2ab2
P4: fix OOB fill enum name
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-30 08:36:26 +00:00
cdd1babf1f
P4: correct CUDA 13.2 API (dataType before rank, FloatOOBfill, globalDim)
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-30 08:35:35 +00:00
8df3ccecea
P4: CUDA 13.2 has 10-param cuTensorMapEncodeTiled (no OOB fill)
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-30 08:35:06 +00:00
d8ffdb66e1
P4: fix API signature rank/dtype order, OOB_FILL defines
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-30 08:34:20 +00:00
277689f8b8
P4: use proper CUDA enum names
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-30 08:33:43 +00:00
6d624a1b14
P4: remove explicit enum casts
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-30 08:33:14 +00:00
4898a946eb
P4: fix TMA descriptor dump API order (dtype before rank)
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-30 08:32:35 +00:00
3943be6063
P4: fix TMA descriptor dump (cuuint64_t dims, proper CUtensorMap API)
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-30 08:31:58 +00:00
4df6ea2d8c
P4: TMA descriptor dump test (cuTensorMapEncodeTiled)
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-30 08:29:27 +00:00
ae425b5522
P3: clean up test, remove debug files, final integration test
First
Previous
...
31
32
33
34
35
...
Next
Last