This website requires JavaScript.
Explore
Help
Register
Sign In
biondizzle
0 Followers
·
0 Following
Joined on
2025-12-10
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.
User to block:
Optional note:
The note is not visible to the blocked user.
Cancel
Block
Repositories
25
Projects
Packages
Public Activity
Starred Repositories
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-28 05:27:37 +00:00
00ac46c9d3
FMHA SM100: Phase 1 — reference scalar implementation
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-28 05:26:00 +00:00
6f7449ce71
FMHA SM100: Fix tcgen05.mma PTX syntax — correct register constraints
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-28 05:24:35 +00:00
a11a245307
fix: use unsigned short for BF16 storage, inline PTX for conversions
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-28 05:22:25 +00:00
2d4e2c57e0
auto: pre-test commit
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-28 05:22:16 +00:00
97df02ea07
fix: -Xcompiler -fPIC for nvcc shared library
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-28 05:21:43 +00:00
4dfb71bc20
test: nvcc direct compilation test (avoid torch JIT __bf16 ICE)
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-28 05:20:32 +00:00
373900fa08
FMHA SM100: Fix launch wrapper to match new kernel API
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-28 05:19:39 +00:00
a30ebfb197
FMHA SM100: Full kernel with TMET PTX, UMMA descriptors, softmax loop
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-28 05:16:43 +00:00
09dfd4a41f
fix: rename .cpp to .cu for CUDA compilation
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-28 05:15:55 +00:00
4c194b7254
fix: add CUDA include path for host compiler
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-28 05:15:09 +00:00
48baea7728
FMHA SM100: Remove CUTLASS includes, write raw PTX inline asm
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-28 05:14:04 +00:00
88d5995ec9
fix: define bf16_t using __bf16 built-in, avoid cuda_bf16.h bug
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-28 05:13:21 +00:00
f0660d0bd7
fix: use C++20 for cuda_bf16.h compat
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-28 05:13:04 +00:00
6bd3356582
fix: include cuda_bf16.h unconditionally, add --expt-relaxed-constexpr
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-28 05:12:32 +00:00
c1266b5275
fix: include cuda_bf16.h only in device code
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-28 05:12:09 +00:00
a64e55665b
fix: avoid cuda_bf16.h, use inline PTX for BF16 conversion
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-28 05:11:40 +00:00
1734d13f60
fix: restore cuda_bf16.h include
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-28 05:11:17 +00:00
8783a25deb
fix: guard cuda_bf16.h with __CUDA_ARCH__
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-28 05:10:44 +00:00
5e389b5ed9
fix: remove duplicate desc declaration
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-28 05:10:17 +00:00
7ac2499266
fix: defer UMMA descriptor — use placeholder for now
First
Previous
...
54
55
56
57
58
...
Next
Last