This website requires JavaScript.
Explore
Help
Register
Sign In
biondizzle
0 Followers
·
0 Following
Joined on
2025-12-10
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.
User to block:
Optional note:
The note is not visible to the blocked user.
Cancel
Block
Repositories
25
Projects
Packages
Public Activity
Starred Repositories
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-22 10:25:56 +00:00
5a08b79364
Revert "debug: test 12w identity softmax with n=256 to verify multi-tile pipeline"
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-22 10:24:54 +00:00
6cf8702e3c
debug: test 12w identity softmax with n=256 to verify multi-tile pipeline
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-22 10:23:39 +00:00
a3c9af8fa3
debug: disable O rescaling to test multi-tile pipeline baseline
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-22 10:22:50 +00:00
c175ec4f09
fix: revert to scaled row_max, use exp2(old_max - new_max) for O rescaling
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-22 10:21:53 +00:00
35c8043064
fix: compute row_max from RAW S values, not scaled
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-22 10:20:38 +00:00
f9f5647eaa
fix: missing newline after self.s_k = s_k
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-22 10:19:50 +00:00
e0c320929a
fix: add s_k param to FmhaV3StageC, use self.s_k for V FMHA reconstruction
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-22 10:19:09 +00:00
fb4ffd8cf7
Stage C: add online O rescaling for multi-tile KV + test n=256
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-22 10:17:10 +00:00
94b0d97107
fix: add epilogue warp to tmem_bar, restore wait_for_alloc in epilogue
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-22 10:15:29 +00:00
65e52f5934
fix: add softmax_done_bar to synchronize MMA PV with softmax P production
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-22 10:11:58 +00:00
ea687980af
fix: epilogue warp self-signals acc_pipe producer before consuming
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-22 10:05:52 +00:00
19b742f365
fix: remove duplicate tmem free from epilogue (MMA warp handles dealloc)
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-22 10:03:09 +00:00
0a3815049f
fix: add acc_pipe pipeline for epilogue, matching 12w pattern
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-22 09:59:22 +00:00
59f4d8a469
fix: epilogue_warp_id must be tuple for epilogue_tma_store, check with [0]
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-22 09:56:19 +00:00
6ba12b7890
fix: epilogue warp reuse mma_corr_cons pipeline instead of creating new one from st
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-22 09:53:01 +00:00
540399eca3
fix: define cS and tScS in correction warps (not visible across if blocks)
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-22 09:49:38 +00:00
ee859099bd
fix: correct @cute.kernel indentation
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-22 09:46:11 +00:00
fc7a790fbd
fix: remove duplicate @cute.kernel decorator
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-22 09:42:40 +00:00
78aac51ab9
FMHA Stage-C2: production 12-warp pipeline with correction warps
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-22 09:39:17 +00:00
78d9024a67
README: update Stage C status to WORKING, add CuTeDSL constraints and target architecture
First
Previous
...
94
95
96
97
98
...
Next
Last