This website requires JavaScript.
Explore
Help
Register
Sign In
biondizzle
0 Followers
·
0 Following
Joined on
2025-12-10
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.
User to block:
Optional note:
The note is not visible to the blocked user.
Cancel
Block
Repositories
25
Projects
Packages
Public Activity
Starred Repositories
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-28 13:46:05 +00:00
1c14ada386
test: write O to tb (overwriting P), same as isolated test
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-28 13:44:38 +00:00
a944f90040
test: match isolated TS test exactly (V=all-1, BLOCK_MN=16)
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-28 13:43:25 +00:00
482328160a
test: single PV K-tile debug
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-28 13:42:01 +00:00
3a40ed6d69
test: skip QK+softmax, write P directly to TMEM for PV debug
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-28 13:40:30 +00:00
f24bc583dc
test: zero O TMEM before PV GEMM
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-28 13:39:40 +00:00
2885b3f2ed
test: full FMHA HD=16 with PV GEMM via tcgen05.mma TS
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-28 13:38:08 +00:00
dc2130cb12
test: cleanup TS MMA test
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-28 13:36:33 +00:00
a767e90a12
test: B=2.0 to understand TS MMA scale factor
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-28 13:34:58 +00:00
b7c6971720
test: use 32x32b.x8 for A write (avoids 16x256b misalign)
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-28 13:33:48 +00:00
a7c81d66ba
test: step-by-step TMEM write/read debug for TS MMA
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-28 13:32:23 +00:00
c05cc1ac93
test: separate TMEM regions for A and C in TS MMA
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-28 13:31:27 +00:00
37a502e476
test: minimal tcgen05.mma TS debug (PV GEMM)
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-28 13:11:35 +00:00
efa03f53d4
docs: update CURRENT_ISSUE and MEMORY — full FMHA HD=64 pipeline working
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-28 13:10:03 +00:00
654a2ae7f4
test: merge softmax+PV into single warp0 block (s_vals scope fix)
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-28 13:09:22 +00:00
5c9e3c41af
test: full FMHA HD=64 — QK+softmax+PV(register math)
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-28 13:08:09 +00:00
0672373e51
test: debug — just QK+softmax+P read (no PV)
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-28 13:06:54 +00:00
5d75decd57
test: full FMHA HD=16 — PV via register math (decode T=1)
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-28 13:05:36 +00:00
f62772992b
test: full FMHA HD=16 with PV GEMM (separate TMEM for P and O)
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-28 13:04:11 +00:00
bd15bce853
test: HD=16 QK+softmax (no PV)
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-28 13:03:15 +00:00
38d7bcd776
test: HD=16 FMHA softmax only (skip PV for now)
First
Previous
...
45
46
47
48
49
...
Next
Last