This website requires JavaScript.
Explore
Help
Register
Sign In
biondizzle
0 Followers
·
0 Following
Joined on
2025-12-10
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.
User to block:
Optional note:
The note is not visible to the blocked user.
Cancel
Block
Repositories
25
Projects
Packages
Public Activity
Starred Repositories
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-28 15:17:43 +00:00
bafd26707b
FMHA HD=64 with BLOCK_MN_B=16, 4 N-tiles per K-tile
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-28 15:16:29 +00:00
6896d1aebb
Update CURRENT_ISSUE: HD=16 done, HD=64 in progress
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-28 15:15:24 +00:00
6b9b06647a
Clean up HD=64 debug prints, keep register-math PV check
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-28 15:13:49 +00:00
5c9d471162
Add register-math PV reference for HD=64 debug
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-28 15:12:22 +00:00
43e9efbc2b
Fix string literal
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-28 15:11:17 +00:00
906be7ce50
Add filtered cosine (exclude near-zero)
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-28 15:09:57 +00:00
40c83c769a
Fix: remove ×2 QK scale correction (MMA scale is 1.0, not 0.5)
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-28 15:07:56 +00:00
6ea7356fdd
Debug: print P values for HD=64
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-28 15:06:39 +00:00
4b052f22a5
Fix: opt into >48KB shared memory for HD=64
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-28 15:03:28 +00:00
7becbfc07e
Fix: printf after var declarations
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-28 15:02:01 +00:00
2d44f8e356
Debug: check if HD=64 kernel starts
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-28 14:58:13 +00:00
46e4d07c71
Test PV SS MMA with B=(64,16) BLOCK_MN=64
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-28 14:56:08 +00:00
465e089a2b
Add launch error check for HD=64
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-28 14:54:13 +00:00
2fd64c464d
FMHA HD=64 with BLOCK_MN_B=64 for V, proper output dimensions
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-28 14:52:32 +00:00
15ecc1f616
Full FMHA HD=64 with PV SS MMA (SMEM-P)
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-28 14:50:44 +00:00
5b2e690936
Milestone: Full FMHA HD=16 with PV SS MMA (SMEM-P) — cosine 0.9997
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-28 14:49:21 +00:00
78026839b7
Fix V canonical layout: swap g_mn/g_k indices (d=MN, lr=K)
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-28 14:47:12 +00:00
9a3b43c42b
Fix reference to also use uniform P
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-28 14:46:23 +00:00
75bdcbf728
Debug: override P with uniform 1/128
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-28 14:45:16 +00:00
af93c283c7
Enable all 8 PV K-tiles
First
Previous
...
42
43
44
45
46
...
Next
Last