This website requires JavaScript.
Explore
Help
Register
Sign In
biondizzle
0 Followers
·
0 Following
Joined on
2025-12-10
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.
User to block:
Optional note:
The note is not visible to the blocked user.
Cancel
Block
Repositories
25
Projects
Packages
Public Activity
Starred Repositories
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-29 19:32:03 +00:00
4459ddefdd
feat: 6-warp TMA FMHA kernel + test — TMA for K loads
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-29 19:30:52 +00:00
7a8ba8eeb6
fix: SMEM size calculation — TILE_SZ is in BF16 elements, need *sizeof(bf16_t) for bytes
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-29 19:29:37 +00:00
aac1b25442
test: TMA QK diagnostic — 3 variants to isolate failure
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-29 19:28:24 +00:00
9dfada6626
test: TMA + canonical + QK GEMM incremental
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-29 19:27:35 +00:00
0435e229bd
fix: typo cuda_SUCCESS -> cudaSuccess
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-29 19:26:59 +00:00
74514e2680
test: TMA sub-tile load — exact pattern from test_qk_softmax
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-29 19:26:10 +00:00
e449d6d5e1
test: TMA diagnostic with 192 threads
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-29 19:25:39 +00:00
0b36b6047a
test: TMA diagnostic with 128 threads
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-29 19:25:04 +00:00
a766b488c2
test: minimal TMA diagnostic — isolate multi-warp TMA bug
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-29 19:12:29 +00:00
fe3b6b8d13
test: QK+softmax T=1 first
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-29 19:11:21 +00:00
a9a87fe7b8
fix: P write with lane stride, use sRowSum
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-29 19:10:10 +00:00
fd6a9b00ae
test: QK + softmax — verify P values against reference
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-29 19:08:49 +00:00
5eff53c145
fix: SMEM layout and printf in PV-only test
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-29 19:06:54 +00:00
106f103c83
test: PV-only GEMM — isolate PV from full FMHA pipeline
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-29 18:57:42 +00:00
5542a9da00
debug: V loaded directly from GMEM (not TMA) to isolate PV issue
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-29 18:54:03 +00:00
2262e10fca
fix: PV GEMM — V canonical uses CORES_MN_V=2 (block_mn=16), not 16
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-29 18:51:00 +00:00
90c3372040
refactor: TMA FMHA kernel — 4-warp, proven pattern, full pipeline
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-29 18:48:41 +00:00
d5e20b2d42
fix: reference should be raw dot product (MMA is unscaled)
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-29 18:47:30 +00:00
2b945f255b
test: TMA K-load + QK GEMM — incremental from working pattern
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-29 18:46:10 +00:00
f33746f183
test: minimal TMA K-load — no MMA/TMEM, just verify TMA + canonical
First
Previous
...
35
36
37
38
39
...
Next
Last