This website requires JavaScript.
Explore
Help
Register
Sign In
biondizzle
0 Followers
·
0 Following
Joined on
2025-12-10
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.
User to block:
Optional note:
The note is not visible to the blocked user.
Cancel
Block
Repositories
25
Projects
Packages
Public Activity
Starred Repositories
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-28 08:01:21 +00:00
3549a2388b
fix: constexpr HD for template param
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-28 08:00:44 +00:00
7436315309
feat: add tcgen05.mma QK GEMM verification kernel + test
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-28 07:54:03 +00:00
6fb3d54c02
docs: update here-docs with CuTeDSL rationale for NVIDIA
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-28 07:49:50 +00:00
9524b674ab
test: enable both reference + TMEM epilogue tests at hd=64/128
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-28 07:49:04 +00:00
446a0ca9fd
refactor(tmem): clean rewrite of TMEM epilogue kernel
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-28 07:47:39 +00:00
c989dc78d9
debug: print sPvBuf[32] value
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-28 07:46:58 +00:00
146e4f0282
debug: print NaN positions in test
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-28 07:46:22 +00:00
b50f6a8512
debug: add TMEM read diagnostic
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-28 07:45:33 +00:00
a12607b0bd
test: add NaN counter to FMHA test
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-28 07:44:46 +00:00
53c676c8a6
test: add max_abs_diff to FMHA test output
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-28 07:44:04 +00:00
579dd061cd
fix: remove duplicate TMEM_COLS_NEEDED declarations
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-28 07:43:42 +00:00
278f1b34af
fix(tmem): correct lane-to-position mapping for tcgen05.ld/st
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-28 07:42:19 +00:00
593bc25afa
test: add TMEM lane mapping diagnostics
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-28 07:41:18 +00:00
33cedbee0a
fix(tmem): TMEM ld/st are warp-collective — ALL 32 lanes must call them
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-28 07:40:06 +00:00
cea02fe407
fix: add cstdio for printf in TMEM debug
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-28 07:39:55 +00:00
0ddcc6bafd
debug: add printf to TMEM kernel to find hang point
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-28 07:38:50 +00:00
44fb04fa1f
test: disable tmem epilogue test (debugging reference hang)
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-28 07:36:13 +00:00
224d7e24c6
harness: add fire_b200_cuda_test + check_b200_cuda, update README
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-28 07:31:44 +00:00
cec505ce14
add CUDA test runner script (screen-based, follows harness pattern)
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-28 07:27:36 +00:00
2eb44a00bf
fix(tmem): warp-collective TMEM ops + one-way correction epilogue
First
Previous
...
52
53
54
55
56
...
Next
Last