This website requires JavaScript.
Explore
Help
Register
Sign In
biondizzle
0 Followers
·
0 Following
Joined on
2025-12-10
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.
User to block:
Optional note:
The note is not visible to the blocked user.
Cancel
Block
Repositories
25
Projects
Packages
Public Activity
Starred Repositories
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-23 02:49:47 +00:00
45cf89a556
fix: use TMEM round-trip normalize + epilogue_tma_store (known ~3% error)
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-23 02:45:32 +00:00
350c7c36ac
fix: correct bSG_gC indexing (6 modes)
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-23 02:44:49 +00:00
6318b4da29
diag: print bSG shapes for TMA store indexing
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-23 02:44:02 +00:00
28060dd944
fix: typo from_dlcap -> from_dlpack
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-23 02:41:09 +00:00
048a546e76
fix: correction_epilog with paired atoms + pre-partitioned TMA store outside if block
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-23 02:37:51 +00:00
0700745852
test: NO-OP round-trip + normalize at n=128 and n=256
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-23 02:34:34 +00:00
2ebfcb2278
fix: correction_epilog with paired atoms + pre-partitioned TMA store
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-23 02:32:41 +00:00
49bf6e8294
diag: NO-OP round-trip before normalize on 2D pattern
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-23 02:31:30 +00:00
6cf1f17904
fix: O rescale uses 2D register tensor pattern, remove fence_view_async_tmem_load
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-23 02:26:59 +00:00
7842d86294
fix: use paired atoms for correction_epilog + cute.copy TMA store
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-23 02:25:47 +00:00
1f4e40decc
diag: add CUDA_LAUNCH_BLOCKING for crash debug
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-23 02:24:38 +00:00
728a24db6a
fix: inline epilogue_tma_store with inv_row_sum multiply using paired atoms
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-23 02:23:17 +00:00
0ecde542f1
fix: use cute.copy instead of cpasync.copy for TMA store
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-23 02:19:42 +00:00
702bf8aa29
fix: correction_epilog with get_tmem_load_op paired atoms + direct TMA store
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-23 02:15:31 +00:00
ea66b6ee8d
diag: NO-OP TMEM round-trip test — load+store back unchanged
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-23 02:13:53 +00:00
6ee28d8423
fix: inline epilogue with paired atoms + inv_row_sum normalize, no TMEM round-trip
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-23 01:41:37 +00:00
043b66406a
fix: all epilogue warps do TMA store, no dynamic if inside method
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-23 01:40:14 +00:00
db3572bafb
fix: correction_epilog with get_tmem_load_op paired atoms, no TMEM round-trip
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-23 01:36:29 +00:00
d99a90ade5
fix: use attn_raw (not softmax'd) for unnorm computation
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-23 01:35:20 +00:00
7becdaf739
diag: skip kernel normalize, do Python-side normalize to isolate TMEM round-trip issue
First
Previous
...
84
85
86
87
88
...
Next
Last