This website requires JavaScript.
Explore
Help
Register
Sign In
biondizzle
0 Followers
·
0 Following
Joined on
2025-12-10
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.
User to block:
Optional note:
The note is not visible to the blocked user.
Cancel
Block
Repositories
25
Projects
Packages
Public Activity
Starred Repositories
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-21 23:24:47 +00:00
b034c915d1
10-warp debug: MMA=warp4 TMA=warp5 idle=6-9 still gives cosine 0.29
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-21 23:11:11 +00:00
0b8f4da323
Layer dispatch: config, schedule, attention/FFN sub-blocks, TransformerLayer
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-21 22:07:55 +00:00
c681b591a0
10-warp idle test: no crash but cosine 0.29 (6-warp gives 0.999999)
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-21 22:04:32 +00:00
c3a9e53253
Router: Blackwell-native fused decode kernel — real CuTeDSL implementation
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-21 21:58:38 +00:00
a813d2824b
Router: clean up dense_router_decode.py — realistic architecture, no fake code
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-21 21:54:17 +00:00
fb243a4133
Router: full kernel stack — hash, topk, activation+topk, dense decode/prefill
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-21 21:20:41 +00:00
a4d12fd560
WIP: correction warp group architecture - compiles, illegal address at runtime
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-21 20:13:54 +00:00
bb3ad3d2ef
BREAKTHROUGH: cosine 0.993 for n=128! PV-partitioned P row sum works.
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-21 19:26:18 +00:00
7d1c402a6d
WIP: TMEM vector bridge not working (same cosine 0.513)
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-21 19:16:18 +00:00
cae87fd744
WIP: confirmed row_sum is wrong (5.5 vs correct 29.22 for row 0)
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-21 19:10:49 +00:00
8eb569e31c
BREAKTHROUGH: Found the real bug!
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-21 19:09:34 +00:00
c09c660110
WIP: scalar C9 normalization - confirmed inv_row_sum is wrong
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-21 18:59:22 +00:00
ce91aa26e4
WIP: QK-partitioned C9 normalization (does not work)
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-21 18:55:03 +00:00
1fa093ee12
BREAKTHROUGH: unnormalized P@V cosine 0.999998 for n=128!
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-21 18:45:32 +00:00
c2901b2ecc
WIP: TMEM vector for per-row row_sum (not yet working)
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-21 18:04:23 +00:00
4c203809ef
WIP: Stage C softmax - partial progress
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-21 17:58:06 +00:00
8e1facef01
Stage C fixes: pv_done_bar sync, acc_scale with scale, fastmath=True
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-21 17:49:29 +00:00
58ca480fd1
Stage C: add validation harness with real softmax reference (C1)
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-21 17:40:27 +00:00
e8485b9cf5
README: add full DSV4 pipeline architecture diagram (CSA/HCA, not MLA)
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-21 17:34:49 +00:00
364d9edcd3
README: update for new dsv4/ package structure
First
Previous
...
96
97
98
99
100
...
Next
Last