This website requires JavaScript.
Explore
Help
Register
Sign In
biondizzle
0 Followers
·
0 Following
Joined on
2025-12-10
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.
User to block:
Optional note:
The note is not visible to the blocked user.
Cancel
Block
Repositories
25
Projects
Packages
Public Activity
Starred Repositories
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-29 18:45:03 +00:00
d64b62bc80
test: simple (128,16) TMA desc for K sub-tile only
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-29 18:44:20 +00:00
eaf8a878cf
fix: only warp 0 lane 0 issues TMA (not all lane 0 threads)
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-29 18:43:45 +00:00
69bf20b09d
fix: SMEM alignment in TMA K-only test
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-29 18:43:08 +00:00
2c0ee69aea
test: TMA K-only — proven gen pattern + TMA for K loads only
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-29 18:42:05 +00:00
9fc2d549e4
fix: warp-collective TMEM read/dealloc in minimal QK test
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-29 18:40:25 +00:00
c755e6fdde
fix: TMEM read/dealloc for 128-thread kernel
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-29 18:40:14 +00:00
bd1309ba88
test: minimal QK — 128 threads, tid==0 MMA, match working gen kernel pattern
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-29 18:37:39 +00:00
39aef1284f
fix: smem size in minimal QK test
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-29 18:37:22 +00:00
ce89fe9170
test: minimal QK — separate sQ0/sK0, clean SMEM layout
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-29 18:35:01 +00:00
71b353577d
fix: QK direct test — per-K-sub-tile Q load (same as working kernel)
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-29 18:33:40 +00:00
35d0596893
fix: T=1 for QK direct test (write_q_to_smem only handles row 0)
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-29 18:32:28 +00:00
bee7cc5f8f
fix: lane vs threadIdx.x in direct QK test
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-29 18:31:59 +00:00
670599b754
test: direct QK GEMM — baseline for TMA comparison
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-29 18:30:54 +00:00
9a185f0222
test: debug Q SMEM canonical after TMA load
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-29 18:29:51 +00:00
1500020593
test: QK-only TMA test — isolate TMA load + canonical + MMA
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-29 18:28:48 +00:00
204cc90808
fix: load full Q (128,HD) once before QK loop — not per K-sub-tile
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-29 18:27:08 +00:00
bf7cf54a51
fix: align TMA SMEM to 128 bytes in verification test
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-29 18:26:39 +00:00
befc2c647b
test: TMA load verification — compare against direct GMEM read
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-29 18:25:48 +00:00
8e09fae3a1
fix: warp-stride for TMA canonical writes — only load warp calls them
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-29 18:24:49 +00:00
3e14a25bb0
fix: don't re-init mbarrier in loop — use phase parity tracking
First
Previous
...
36
37
38
39
40
...
Next
Last