This website requires JavaScript.
Explore
Help
Register
Sign In
biondizzle
0 Followers
·
0 Following
Joined on
2025-12-10
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.
User to block:
Optional note:
The note is not visible to the blocked user.
Cancel
Block
Repositories
25
Projects
Packages
Public Activity
Starred Repositories
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-23 05:12:31 +00:00
a0363e8911
Fix CuTeDSL scoping: hoist P store vars out of if block
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-23 05:10:48 +00:00
86bf5771c1
Fix O rescale: use Stage C proven correction_rescale pattern
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-23 05:09:26 +00:00
e204aa7a4c
Fix tOrP0 indexing: 3-dim slice (None,None,kb) not 4-dim
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-23 05:08:18 +00:00
dda5afee87
Fix CuTeDSL scoping: unconditionally define tOrP0 and tCrP
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-23 05:07:32 +00:00
0f715bfaff
Fix CuTeDSL variable scoping: define tOrP0 and tCrP in both branches
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-23 05:06:47 +00:00
70db626550
Fix p_tmem_s: use ComposedLayout from make_smem_layout_a, pass as kernel arg
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-23 05:04:53 +00:00
09cac38a67
Consolidate FMHA stages A/B/C into unified kernel module with SMEM-P stub
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-23 03:56:57 +00:00
6f834ae8b5
WIP: make_tiled_copy_C for P→SMEM
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-23 03:54:50 +00:00
8114a225d1
fix: cpasync.CopyOp for reg→SMEM
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-23 03:52:49 +00:00
0dbdc4f865
fix: CopyAtomUniversalOp
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-23 03:51:59 +00:00
05173c1992
WIP: tiled copy for P→SMEM (zero fill)
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-23 03:51:02 +00:00
5a9c299f64
fix: cute.copy(dst, src) order
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-23 03:50:11 +00:00
398f5cf631
fix: BFloat16 not Float32 for bf16 reg
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-23 03:49:06 +00:00
9bc7fc9361
WIP: P→SMEM write stub (zero fill, proper mapping TODO)
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-23 03:47:55 +00:00
ed35a8a4ba
fix: partition_A not partition_S
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-23 03:47:10 +00:00
48432522b8
fix: make_smem_layout_epi not make_epilogue_smem_layout
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-23 03:46:02 +00:00
07f319d1f3
WIP: SMEM P path for PV (compiles but P write not implemented)
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-23 03:42:55 +00:00
1be005296c
debug: hd=64 with CUDA_LAUNCH_BLOCKING
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-23 03:42:10 +00:00
482928f142
D1: P store as BF16 using PV A-fragment layout (tOrP0)
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-23 03:40:11 +00:00
f266c3dae2
D1: align P store and PV A-fragment layouts via tP
First
Previous
...
82
83
84
85
86
...
Next
Last