This website requires JavaScript.
Explore
Help
Register
Sign In
biondizzle
0 Followers
·
0 Following
Joined on
2025-12-10
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.
User to block:
Optional note:
The note is not visible to the blocked user.
Cancel
Block
Repositories
25
Projects
Packages
Public Activity
Starred Repositories
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-22 21:21:25 +00:00
bb92af5b0c
FIX: Use full 8D indexing for tBgK/tVgV — mode 4 is the GMEM tile dim
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-22 21:20:47 +00:00
2a9f764f8b
Diagnostic: check tBgK/tVgV layout strides for degenerate dims
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-22 21:14:44 +00:00
ae173d3963
Test identity diag multi-tile
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-22 21:13:39 +00:00
a35fb1b077
Minimal reference FMHA test: n=256 only
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-22 21:12:50 +00:00
4fc264e034
Test reference FMHA with proper API
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-22 21:12:03 +00:00
a800a83d5c
Test: CUTLASS reference FMHA on B200 multi-tile
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-22 20:37:23 +00:00
2a14c2dd18
REVERT to working baseline (n=128 cos 0.999998). Multi-tile TMA is a CuTeDSL JIT limitation.
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-22 20:36:21 +00:00
1ab326f2d2
Test: use kvh.index (pipeline state) as TMA GMEM coordinate
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-22 20:35:46 +00:00
7b8b022e23
SMEM counter: separate allocate_tensor instead of struct field
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-22 20:35:19 +00:00
462778efcf
Fix SMEM counter type: cutlass.Int32 for MemRange
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-22 20:34:54 +00:00
f5c827d0b9
SMEM-backed kv_coord counter — JIT can't constant-fold SMEM reads
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-22 20:34:23 +00:00
215282971c
DEBUG: hardcoded Int32(1) to test if TMA can read tile 1
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-22 20:34:06 +00:00
79ebe20a39
DEBUG: use Int32(kt) directly to test if coordinate matters
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-22 20:33:42 +00:00
b3778896b9
Test: kv_coord = warp_idx() * 0 — force SSA from runtime value
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-22 20:33:05 +00:00
1de848c5ca
DEBUG: add cute.printf for kv_coord runtime value
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-22 20:32:46 +00:00
587c16679c
Test: Python range() instead of cutlass.range() for TMA loop
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-22 20:32:27 +00:00
91230fe5e6
Test example9: drop try_acquire/pk, single loop-carried kv_coord
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-22 20:28:18 +00:00
3c0451a3e5
REVERT to working example7 (n=128 cos 0.999998). Example8 TMA fix didn't work.
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-22 20:28:01 +00:00
880bd9ef81
Update stage_c test to example8: SSA kv_coord + per-tile O rescale
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-22 20:25:30 +00:00
c395b279d2
Clean up tests: archive superseded files, keep only essential unit tests
First
Previous
...
89
90
91
92
93
...
Next
Last