This website requires JavaScript.
Explore
Help
Register
Sign In
biondizzle
0 Followers
·
0 Following
Joined on
2025-12-10
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.
User to block:
Optional note:
The note is not visible to the blocked user.
Cancel
Block
Repositories
25
Projects
Packages
Public Activity
Starred Repositories
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-23 23:21:18 +00:00
e0a11e32f8
D1.3: Fix coord extraction - identity tensor stores (m,k) pairs as values
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-23 23:20:13 +00:00
a7171fa5e1
D1.3: Fix coordinate indexing - tTMEM_LOADcS first mode is (32,1) nested tuple
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-23 23:19:23 +00:00
df7bc40d37
D1.3: Direct coordinate-indexed SMEM-P write using tTMEM_LOADcS coords
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-23 23:17:32 +00:00
2bbe55b08c
D1.3: Use make_cotiled_copy for SMEM-P — custom TV layout from TMEM-load coords to sP
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-23 23:03:37 +00:00
2e86ed939e
Add SMEM-P guidance request document for CUTLASS LLM consultation
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-23 22:30:57 +00:00
029c21a2af
D1.3: Use const_expr for lse None check
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-23 22:30:16 +00:00
1720a0e86b
D1.3: Fix LSE with const_expr, always create valid mLSE tensor
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-23 22:29:12 +00:00
bce31176aa
D1.3: Try make_tiled_copy_C(qk_mma) for SMEM-P copy - zero-fill source for compile test
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-23 22:28:16 +00:00
c80bd021c9
D1.3: Define SMEM-P copy atoms unconditionally (CuTeDSL scoping)
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-23 22:27:12 +00:00
43bb501acb
D1.3: Use full sP (4D) for make_tiled_copy_D partition
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-23 22:26:11 +00:00
06fd2f63e9
D1.3: SMEM-P via get_smem_store_op + make_tiled_copy_D
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-23 22:24:18 +00:00
c507d0640c
D1.3: Enhanced diagnostic - test QK C-fragment as source for make_tiled_copy_C
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-23 22:21:33 +00:00
bf896c0894
D1.3: Skip fragment creation in diagnostic, just print layouts
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-23 22:20:17 +00:00
0c435b3e51
D1.3: Fix diagnostic - use dummy ptr 0 for shape analysis
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-23 22:19:12 +00:00
55caf8be38
D1.3: Fix sP allocation - p_smem_s.outer is already a layout
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-23 22:17:58 +00:00
d1c600f599
D1.3: Fix layout diagnostic - compute c_major outside kernel
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-23 22:16:58 +00:00
d3d0020b4e
D1.3: Layout diagnostic v2 - run inside JIT-compiled kernel
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-23 22:15:49 +00:00
ec8fd1474c
D1.3: Fix layout diagnostic - remove JIT-dependent code
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-23 22:14:37 +00:00
c3a7c30f20
D1.3: Layout diagnostic - print all QK C-fragment and PV A-operand shapes
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-23 22:07:54 +00:00
b59aca4655
Update all .md files with D5a/D5b progress, tOrP0 fix, LSE formula
First
Previous
...
74
75
76
77
78
...
Next
Last