This website requires JavaScript.
Explore
Help
Register
Sign In
biondizzle
0 Followers
·
0 Following
Joined on
2025-12-10
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.
User to block:
Optional note:
The note is not visible to the blocked user.
Cancel
Block
Repositories
25
Projects
Packages
Public Activity
Starred Repositories
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-27 04:54:28 +00:00
3a7d87adba
Fix test_smem_acc: use keyword args for lse/row_sums
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-27 04:53:48 +00:00
6a621bdf64
D1.5: SMEM accumulator FMHA kernel — one-way TMEM→REGS→SMEM, no round-trip
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-27 02:17:53 +00:00
81acf1593c
Revert "D1.5: WIP SMEM accumulator — framework in place, accumulation logic TODO"
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-27 02:15:26 +00:00
72d88af400
D1.5: WIP SMEM accumulator — framework in place, accumulation logic TODO
a6da93ddfb
Revert "D1.5: Try O rescale with tCtO_base layout (epilogue-proven TMEM addressing)"
Compare 2 commits »
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-27 02:10:41 +00:00
79e2eb3b42
D1.5: Try O rescale with tCtO_base layout (epilogue-proven TMEM addressing)
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-26 21:00:42 +00:00
f94978ffa7
D1.5: Prepare for SMEM accumulator implementation
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-26 20:55:19 +00:00
afb93eae22
D1.5: Revert broken TMEM round-trip O rescale, document as fundamentally broken
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-26 20:46:00 +00:00
42c5793add
D1.5: Add isolated round-trip test comparing s_k=128 vs s_k=256 with NOOP rescale
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-26 20:43:34 +00:00
e35b30dae6
D1.5 debug: try corr_tile_size=32 for O rescale round-trip
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-26 20:31:29 +00:00
20ed6d5114
D1.5: Add TMEM load fence before PV with ACCUMULATE, revert debug rescale factor
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-26 20:29:39 +00:00
34d64137ec
D1.5 debug: force rescale_factor=0.5 to test if round-trip code executes
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-26 20:28:57 +00:00
3be708d923
D1.5 debug: add NOOP rescale test (acc_scale=1.0) to isolate TMEM round-trip corruption
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-26 20:27:38 +00:00
c3648e4ebf
D1.5 debug: add targeted s_k=256 rescale diagnostic test
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-26 20:26:10 +00:00
bf2c7c8bb8
D1.5: Implement in-kernel O rescale via CUTLASS correction_rescale pattern
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-26 19:53:13 +00:00
064ececc9a
Update docs: D1.5 TMEM round-trip fundamentally broken, Python KV merge is production path
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-26 19:50:33 +00:00
2b4f4ce538
Remove broken D1.5 paired-atom test (TMEM round-trick is fundamentally broken)
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-26 19:50:17 +00:00
ffb3e736bb
D1.5: Revert broken paired-atom O rescale — TMEM round-trip fundamentally broken
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-26 19:46:21 +00:00
40cbf0c223
Add D1.5 paired-atom O rescale test (s_k=256/384, hd=64/128)
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-26 19:34:28 +00:00
43f0b5d1e8
D1.5: Fix O rescale with paired atoms (incremental approach)
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-26 19:11:20 +00:00
4bb0e063cc
D1.5: Replace broken TMEM round-trip with correction epilogue (paired atoms)
First
Previous
...
60
61
62
63
64
...
Next
Last