This website requires JavaScript.
Explore
Help
Register
Sign In
biondizzle
0 Followers
·
0 Following
Joined on
2025-12-10
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.
User to block:
Optional note:
The note is not visible to the blocked user.
Cancel
Block
Repositories
25
Projects
Packages
Public Activity
Starred Repositories
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-28 07:12:27 +00:00
bd16e8fa85
fix: use tcgen05.wait::st/ld instead of nonexistent tcgen05.fence
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-28 07:09:08 +00:00
ba1e81f2dc
test: minimal TMEM isolation test (alloc, store, load, dealloc)
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-28 07:05:10 +00:00
4fe9bbab48
add back in the archived code
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-28 07:01:38 +00:00
4336de9372
attention/: Clean up folder, archive backups, add detailed status headers
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-28 06:46:29 +00:00
d46ae8b967
test: disable TMEM test (hanging), verify reference still works
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-28 06:42:02 +00:00
e58980f80e
fix: increase test timeout for TMEM kernel
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-28 06:39:21 +00:00
a391615f60
fix: uint64_t for SMEM pointer
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-28 06:37:56 +00:00
b4779e3f48
fix: cvta.to.shared.u64 for 64-bit SMEM pointers
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-28 06:36:50 +00:00
cf264bd0e2
fix: cvta.shared.u32 (not cvta.to.shared)
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-28 06:35:51 +00:00
771799e112
FMHA SM100: Fix TMEM operations — uint32_t registers, correct PTX syntax
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-28 06:32:56 +00:00
73d1e38129
fix: last HD→HD_val
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-28 06:32:05 +00:00
e940786fd5
fix: HD_val variable name in test
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-28 06:31:06 +00:00
e173295a3a
FMHA SM100: Refactor into common + reference + TMEM epilogue headers
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-28 06:29:13 +00:00
a73fb689f9
fix: dispatch template HD at compile time
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-28 06:27:57 +00:00
bcc5d0b6cb
FMHA SM100: Add TMEM+correction epilogue kernel (Priority 2)
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-28 05:34:05 +00:00
8eb735618f
fix: use expf for softmax (not exp2f with scale)
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-28 05:32:50 +00:00
3cb339129b
FMHA SM100: Fix Phase 1 — single-thread reference for correctness
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-28 05:31:40 +00:00
7fb838913f
fix: include path for standalone test
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-28 05:31:05 +00:00
99b35eb2de
test: standalone CUDA test for FMHA SM100 (no PyTorch needed)
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-28 05:28:57 +00:00
77fa34a9a6
fix: update launch wrapper for fmha_decode_ref
First
Previous
...
53
54
55
56
57
...
Next
Last