This website requires JavaScript.
Explore
Help
Register
Sign In
biondizzle
0 Followers
·
0 Following
Joined on
2025-12-10
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.
User to block:
Optional note:
The note is not visible to the blocked user.
Cancel
Block
Repositories
25
Projects
Packages
Public Activity
Starred Repositories
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-30 16:56:10 +00:00
212fc85627
P6: One-way TMEM→regs→SMEM→TMA store epilogue
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-30 10:54:24 +00:00
05b5bf9db1
docs: mark P5 as done in NEXT_PRIORITIES.md
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-30 10:49:14 +00:00
95e0c8c464
P5: fix multi-tile test — use same Q data for kernel and reference
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-30 10:47:01 +00:00
e701a1411c
P5: use multi-tile kernel for N>128 in integration test
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-30 10:46:16 +00:00
5932e928a8
cleanup: remove debug test files (P4, P5)
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-30 10:45:03 +00:00
8fef46ce73
P5: add reference comparison to Python multi-tile test
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-30 10:43:30 +00:00
897a70a491
P5: minimal Python multi-tile test
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-30 10:40:02 +00:00
a2627359fb
P5: fix TMA desc creation — write to HOST then cudaMemcpy to device
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-30 10:38:35 +00:00
f370bfb1f1
P5: re-enable multi-tile Python tests, fix CAPI to use create_tma_desc_2d_bf16
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-30 10:35:36 +00:00
da54f6439f
P5: fix TMA multitile test (include cuda.h first, proper SMEM calc)
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-30 10:34:23 +00:00
34320653e9
P5: standalone TMA multi-tile test with 128B-aligned memory
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-30 10:32:46 +00:00
a1d05b3055
P5: disable multi-tile Python tests (TMA descriptor alignment issue)
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-30 10:30:42 +00:00
97531a68e6
fix: remove n_kv_tiles from capi too
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-30 10:28:39 +00:00
a5b47602b5
fix: remove n_kv_tiles from standalone test (struct doesn't have it anymore)
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-30 10:27:40 +00:00
f032800eaa
P5: integrate WORKING multi-tile kernel (fmha_6warp_tma_multirow_multitile) into production
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-30 09:07:10 +00:00
032cb4c7b2
P5: add single-tile merge comparison to multitile test
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-30 09:04:50 +00:00
d424ccbcc1
fix: const not constexpr for SCALE
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-30 09:04:07 +00:00
3da31de4c0
P5: fix BF16 host helpers for standalone test
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-30 09:01:53 +00:00
9e6ba25a98
P5: standalone multi-tile CUDA test (2 KV tiles, hd=64)
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-30 08:59:53 +00:00
b61df2657b
P5: fix reference attention for MQA/GQA (kv_idx = h // q_per_kv)
First
Previous
...
30
31
32
33
34
...
Next
Last