This website requires JavaScript.
Explore
Help
Register
Sign In
biondizzle
0 Followers
·
0 Following
Joined on
2025-12-10
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.
User to block:
Optional note:
The note is not visible to the blocked user.
Cancel
Block
Repositories
25
Projects
Packages
Public Activity
Starred Repositories
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-29 18:22:28 +00:00
bd169ccb0f
fix: smart quote in fmha_tma.cuh
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-29 18:22:02 +00:00
345b107f4c
fix: TMA mbarrier — add arrive.expect_tx (root cause of multi-warp hang)
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-29 07:02:09 +00:00
c69f3668e1
feat: TMA async FMHA kernel — WORKING on B200
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-29 06:52:42 +00:00
a40c05f3f2
archive: TMA driver-API files + CUDA 13 TMA discovery notes
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-29 06:42:08 +00:00
55f0c6267b
auto: pre-test commit
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-29 06:35:06 +00:00
197cac875c
fix: CUDA 13 TMA descriptor — 3D rank + byte strides + mbarrier byte count
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-29 04:45:54 +00:00
85cd95e609
debug: TMA context fix test
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-29 04:45:07 +00:00
76c82ebdcd
debug: detailed TMA descriptor debug test
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-29 04:43:26 +00:00
0c9245b4d2
fix: add cuInit(0) for CUDA driver API
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-29 04:42:45 +00:00
6cc2f61431
debug: TMA descriptor dimension test
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-29 04:42:04 +00:00
3412ff1a9b
fix: TMA tile strides must match global strides, not tile dimensions
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-29 04:41:04 +00:00
409838ace2
refactor: per-sub-tile TMA loads with padded GMEM allocations
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-29 04:39:23 +00:00
8c17f65f5b
fix: cast typo
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-29 04:39:14 +00:00
8908b697dd
fix: bool type mismatch
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-29 04:38:59 +00:00
b78ebe8a9c
debug: add TMA descriptor error reporting
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-29 04:37:50 +00:00
c7a6d7d231
fix: tma_mbar_init → tma_mbarrier_init (typo)
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-29 04:36:55 +00:00
696462f07a
feat: TMA async load infrastructure for FMHA kernel
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-28 23:57:34 +00:00
d1c1eaeddc
clean: remove debug prints, multirow kernel complete with multi-tile KV merge
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-28 23:55:59 +00:00
c65baabcc9
fix: V tile copy — V is (HD, SK_TOTAL) so tile columns are not contiguous
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-28 23:54:33 +00:00
869460a932
debug: add LSE verification and merge debug prints
First
Previous
...
37
38
39
40
41
...
Next
Last