This website requires JavaScript.
Explore
Help
Register
Sign In
biondizzle
0 Followers
·
0 Following
Joined on
2025-12-10
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.
User to block:
Optional note:
The note is not visible to the blocked user.
Cancel
Block
Repositories
25
Projects
Packages
Public Activity
Starred Repositories
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-24 00:41:29 +00:00
74e1c0420a
D1.5: Implement correction epilog with paired atoms (get_tmem_load_op + get_smem_store_op)
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-24 00:37:37 +00:00
d96786ec44
D1.5: Add TODO for correction epilog - keeping working TMEM round-trip for now
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-24 00:35:06 +00:00
ae7a1f5e0a
D1.5: Revert to pre-epilog backup - correction epilog refactor is complex, will do incrementally
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-24 00:32:37 +00:00
a59d57e4d5
D1.5: Fix TMA store - use local_tile with pv_mma_tiler
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-24 00:31:46 +00:00
a6bf31a22e
D1.5: Fix TMA store rank mismatch - use 2D sC_epi view
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-24 00:30:41 +00:00
d316875145
D1.5: Implement correction epilog with get_tmem_load_op + get_smem_store_op paired atoms
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-24 00:24:27 +00:00
8514a72ba0
D1.5: Replace TMEM round-trip normalize with correction epilog (one-way: TMEM→reg→SMEM→GMEM)
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-24 00:16:24 +00:00
26de7254ad
D1.3: Fix LSE tensor layout for weakly congruent store
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-24 00:15:43 +00:00
f259fafcae
D1.3: Add unnormalized debug test to isolate SMEM-P vs O round-trip error
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-24 00:13:30 +00:00
b72062e47c
D1.3: Add SMEM-P write/read diagnostic
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-24 00:10:20 +00:00
6d6b91dcb4
D1.3: Add SMEM-P vs TMEM-P comparison test
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-24 00:07:23 +00:00
f1b8fef3a2
D1.3: Fix while loop in cotiled diag - precompute num_tmem_alloc_cols
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-24 00:06:52 +00:00
921decb516
D1.3: Fix cotiled diagnostic - use proper MMA construction
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-24 00:05:49 +00:00
25747675cf
D1.3: Add make_cotiled_copy diagnostic test
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-24 00:01:24 +00:00
24bc318480
shit left dangling
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-23 23:26:48 +00:00
d092a1743a
D1.3: Re-enable coordinate-indexed SMEM-P write with identity tensor coords
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-23 23:26:08 +00:00
a17dca508d
D1.3: Revert to zero-fill for sP - need to verify sP→PV pipeline first
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-23 23:24:55 +00:00
5be5d42e94
D1.3: Compute (m,k) directly from thread mapping instead of identity tensor
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-23 23:24:14 +00:00
23964d28c0
D1.3: Add debug prints for SMEM-P coordinate mapping
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-23 23:23:07 +00:00
1e5635b93f
D1.3: Add SMEM-P coordinate diagnostic test
First
Previous
...
73
74
75
76
77
...
Next
Last