This website requires JavaScript.
Explore
Help
Register
Sign In
biondizzle
0 Followers
·
0 Following
Joined on
2025-12-10
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.
User to block:
Optional note:
The note is not visible to the blocked user.
Cancel
Block
Repositories
25
Projects
Packages
Public Activity
Starred Repositories
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-22 19:09:35 +00:00
d77c965646
Disable O rescale + normalize, verify softmax P only
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-22 19:06:54 +00:00
dcc64dd14d
FIX: O sub-tile count should be HEAD_DIM/corr_tile_size, not 128/corr_tile_size
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-22 19:06:05 +00:00
48b24ba005
Full pipeline: O rescale + final normalize with CUTLASS sub-tile approach
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-22 19:05:28 +00:00
a85894df89
Test softmax P vs unnormalized reference (no O normalize)
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-22 19:04:39 +00:00
c0b39fc2bf
O normalize using CUTLASS reference sub-tile approach
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-22 18:59:17 +00:00
3dbda0eebb
Fix O normalize: use 2D register tensor indexing
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-22 18:58:36 +00:00
6b61d5274c
Add O normalization with sub-tile TMEM read-modify-write
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-22 18:57:01 +00:00
b936c6220d
Simplify: softmax P only, no O rescale/normalize yet
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-22 18:55:32 +00:00
e2fad84205
Real softmax test built on working identity diag
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-22 18:53:01 +00:00
d3b662d3a8
CRITICAL FIX: remove extra scale_log2 in softmax (minus_row_max and acc_scale)
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-22 18:51:23 +00:00
32869c7378
FIX: K slice (None,None,0,0) like working diag
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-22 18:49:34 +00:00
4b1fc7ee1f
Diag: identity softmax on example6 pipeline to isolate softmax bug
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-22 18:47:55 +00:00
912f92c6b5
Quick test: working v3 with n=256 multi-tile
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-22 18:47:08 +00:00
5b6392beaa
DEBUG: add version marker to confirm code changes are running
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-22 18:45:57 +00:00
c7d55a5f49
CRITICAL FIX: TMA pre-slice (None,0,None,0) → (None,None,0,0) to keep GMEM tile dim free
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-22 18:43:57 +00:00
f734610268
Diag: TMA shapes with hardcoded major modes
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-22 18:43:32 +00:00
18a589347c
Diag: simplified TMA shape analysis
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-22 18:43:07 +00:00
7ad4ddb6ba
Diag: print TMA partition shapes for multi-tile debugging
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-22 18:26:33 +00:00
67c5a0928d
FIX: Use Python range() in TMA warp for concrete per-iteration GMEM coords
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-22 18:25:16 +00:00
54de81985f
FIX: Force SSA GMEM coord via n_kv_tiles - n_kv_tiles instead of cutlass.range kt
First
Previous
...
91
92
93
94
95
...
Next
Last