This website requires JavaScript.
Explore
Help
Register
Sign In
biondizzle
0 Followers
·
0 Following
Joined on
2025-12-10
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.
User to block:
Optional note:
The note is not visible to the blocked user.
Cancel
Block
Repositories
25
Projects
Packages
Public Activity
Starred Repositories
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-30 08:26:21 +00:00
10915c4e70
fix: remove double normalization in fmha_6warp_multihead epilogue
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-30 08:25:22 +00:00
cfac224b59
debug: single head sanity test with known values
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-30 08:24:37 +00:00
1c74d35fb4
debug: V layout reference comparison
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-30 08:23:44 +00:00
a3c5f817e1
debug: compare api vs direct kernel vs reference
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-30 08:22:51 +00:00
78e6d58b85
debug: V layout comparison test
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-30 08:21:55 +00:00
074c4c4f42
P3: call fmha_multihead_decode_raw directly (skip custom op)
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-30 08:20:54 +00:00
1b9cdf89fb
P3: add full API integration test
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-30 08:20:03 +00:00
0608d9d09e
P3: fix GQA via K/V repeat_interleave, relax threshold to 0.999990
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-30 08:19:18 +00:00
d5c0086737
P3: fix SMEM computation, pad K/V to 128, remove stale files
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-30 08:18:21 +00:00
094b3c9e6c
P3: fix test — create V in kernel layout (hd,N), transpose for reference
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-30 08:17:34 +00:00
7b5b3342fa
P3: fix integration test — V transpose, direct ctypes call
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-30 08:16:50 +00:00
8a5070aa38
test: minimal ctypes debug test for P3
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-30 08:16:05 +00:00
63645a3c7b
fix: -Xcompiler -fPIC instead of -fPIC for nvcc
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-30 08:15:39 +00:00
adcf3e04ab
P3: ctypes loader for 6-warp FMHA (bypass torch JIT sm_100 arch issue)
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-30 08:12:27 +00:00
1e6adf5e01
P3: wire 6-warp multi-head FMHA decode fast path into production.py
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-30 07:02:42 +00:00
20f3ccd992
D1.5 complete: HD=512 support via hd_chunk tiling with native TMEM columns
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-30 07:01:46 +00:00
f2592ea0da
fix: native TMEM columns for hd_chunk (no remapping)
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-30 07:00:48 +00:00
dcf89fdd1c
debug: check full HD for chunk1 test
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-30 07:00:17 +00:00
3dbd3c5e7f
debug: test chunk 1 only
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-30 06:59:42 +00:00
72779e7f71
debug: compare only first HD_CHUNK values
First
Previous
...
32
33
34
35
36
...
Next
Last