This website requires JavaScript.
Explore
Help
Register
Sign In
biondizzle
0 Followers
·
0 Following
Joined on
2025-12-10
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.
User to block:
Optional note:
The note is not visible to the blocked user.
Cancel
Block
Repositories
25
Projects
Packages
Public Activity
Starred Repositories
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-22 23:18:41 +00:00
1726f371c1
FIX: 8-None no-op pre-slice opens full TMA coordinate space (8 dims)
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-22 23:08:30 +00:00
2f716a5856
FIX: tma_partition tensors have 4 modes, not 8. Mode 2 is GMEM tile dim.
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-22 22:58:11 +00:00
02969c15fc
Fix test_fmha_v3_stage_c.py: 8-mode TMA indexing (mode 4 = GMEM tile dim)
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-22 22:57:55 +00:00
80be9af3a0
Fix README: multi-tile was layout bug not JIT bug, add example10, update status
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-22 22:40:10 +00:00
078071e98a
Add diag test with 8-mode TMA indexing from commit
2711611
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-22 22:38:09 +00:00
e69ead0c35
auto: pre-test commit
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-22 22:36:41 +00:00
beaf60db5c
DOCUMENT: TMA 8-mode indexing — the bug that cost us a full day. README + inline comments.
27116110ab
Fix identity diag: same 8D TMA indexing fix
bb92af5b0c
FIX: Use full 8D indexing for tBgK/tVgV — mode 4 is the GMEM tile dim
2a9f764f8b
Diagnostic: check tBgK/tVgV layout strides for degenerate dims
ae173d3963
Test identity diag multi-tile
Compare 8 commits »
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-22 22:31:34 +00:00
88b70e56fb
DEBUG: print flat_divide shapes
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-22 22:30:51 +00:00
f64af7b0e0
Switch gK/gV to flat_divide (CUTLASS FMHA reference pattern) for proper TMA strides
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-22 22:28:34 +00:00
daa4017505
DEBUG: print tBgK/tVgV layout to check strides
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-22 22:27:59 +00:00
be103deb6d
Try kt (cutlass.range induction) with correct (None,0,None,0) pre-slice
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-22 22:27:01 +00:00
d656598185
SSA-seed kv_coord: n_kv_tiles - n_kv_tiles forces JIT to track as runtime reg
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-22 22:26:18 +00:00
5de530fe18
Fix tBgK pre-slice: (None,0,None,0) preserves kv_tiles at mode 2 (matching tVgV)
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-22 22:25:37 +00:00
1b35e0f967
DEBUG: print TMA partition tensor shapes
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-22 22:25:02 +00:00
16f60e2dd1
Fix multi-tile TMA: loop-carried kv_coord (CUTLASS reference pattern)
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-22 22:18:53 +00:00
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-22 22:17:47 +00:00
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-22 22:16:24 +00:00
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-22 22:04:31 +00:00
8138e5c62a
Dynamic blk_coord in pre-slice (matching reference pattern)
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-22 22:03:42 +00:00
55cdfad7c6
Pass seqlen_k as kernel arg, derive kv_coord dynamically (force SSA tracking)
First
Previous
...
87
88
89
90
91
...
Next
Last