This website requires JavaScript.
Explore
Help
Register
Sign In
biondizzle
0 Followers
·
0 Following
Joined on
2025-12-10
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.
User to block:
Optional note:
The note is not visible to the blocked user.
Cancel
Block
Repositories
25
Projects
Packages
Public Activity
Starred Repositories
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-23 01:28:33 +00:00
039c8b90ce
diag: print expected unnorm P@V for comparison with raw kernel output
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-23 01:27:05 +00:00
ec5b892e32
diag: skip final normalize, test raw PV output via epilogue_tma_store
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-23 01:25:54 +00:00
e50644afde
fix: O rescale uses 2D register tensor pattern (matching CUTLASS correction_rescale)
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-23 01:24:41 +00:00
b77c8d83f5
fix: pre-compute tmem_load_epi_atom in __call__, pass to kernel
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-23 01:23:06 +00:00
c9271ffbf4
fix: index into TMA partitioned tensors for copy
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-23 01:22:25 +00:00
e01ff282b7
fix: use flat_divide+group_modes(0,2) for TMA store, matching CUTLASS
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-23 01:20:53 +00:00
5efa9c9297
fix: use gC not tCgC for TMA partition, group modes 0-3
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-23 01:20:10 +00:00
7a894c4bf6
fix: use tma_partition for TMA store in correction_epilog
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-23 01:19:01 +00:00
3c134f7e90
fix: replace TMEM round-trip normalize with CUTLASS correction_epilog pattern
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-23 01:17:16 +00:00
690fd77e6c
diag: inv_row_sum=1.0 to test raw PV, n=128 only
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-23 01:13:31 +00:00
2b93b10199
diag: test original code n=128+256 to confirm baseline
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-23 01:12:51 +00:00
9bcddb68e1
diag: disable O rescale properly, test n=128+256 baseline
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-23 01:12:09 +00:00
0ef41266de
diag: test n=128 and n=256 both with rescale disabled
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-23 01:11:27 +00:00
dc44fa187a
fix: indentation error in diag disable
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-23 01:11:02 +00:00
a9cace316d
diag: disable O rescale to isolate the issue (n=256 only)
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-23 01:02:35 +00:00
08d4af90ca
debug: add wide-search diagnostics for n=256 O rescale
biondizzle
pushed tag
v-multitile-softmax-wip
to
biondizzle/nvfp4-megamoe-kernel
2026-05-23 00:35:49 +00:00
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-23 00:35:44 +00:00
f026c1824c
🚀
MULTI-TILE SOFTMAX + O RESCALE WORKING: n=128 cos 0.999998, n=256 cos 0.80
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-23 00:34:38 +00:00
d511ebe387
Debug: add row_sum/inv_row_sum printf at final normalize
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-23 00:33:39 +00:00
c2ff8e072e
Fix ALL loops: use self.n_kv_tiles everywhere
First
Previous
...
85
86
87
88
89
...
Next
Last