This website requires JavaScript.
Explore
Help
Register
Sign In
biondizzle
0 Followers
·
0 Following
Joined on
2025-12-10
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.
User to block:
Optional note:
The note is not visible to the blocked user.
Cancel
Block
Repositories
25
Projects
Packages
Public Activity
Starred Repositories
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-06-03 02:55:03 +00:00
9034f67b0f
Fix prefill kernel: read ALL n_sub PV results (was only n_sub=0)
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-06-03 02:50:28 +00:00
a4ef6c3454
Add B1 mixed FP8 prefill FMHA kernel (T>1 support)
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-06-03 02:47:49 +00:00
1f757151ef
Fix router gate BF16 quantize path for production FMHA test
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-06-03 02:38:17 +00:00
07168357cc
Fix o_a_proj weight loading: add BF16 fallback for grouped linear
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-06-03 02:31:12 +00:00
27d8d80a40
Fix missing DEVICE constant in production FMHA test
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-06-03 02:26:40 +00:00
26a817c2f2
Fix production FMHA layer test: compare raw FMHA vs SDPA on production gathered KV
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-06-03 02:22:36 +00:00
ba67e055f7
Add production FMHA layer comparison test
biondizzle
pushed tag
v-b1-b2-done-20260603
to
biondizzle/nvfp4-megamoe-kernel
2026-06-03 02:14:54 +00:00
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-06-03 01:53:00 +00:00
af58f2c5b2
Add B1 weight/format verification at L0 in single_shot
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-06-03 01:51:00 +00:00
8df5de5477
Update B1 docs with test results and bug fix
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-06-03 01:50:23 +00:00
3e3b352e7e
Update FINAL_STRETCH.md: B1 and B2 marked DONE with test results and bug fixes
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-06-03 01:49:40 +00:00
84a02f8995
Remove debug test files, keep production B1/B2 unit tests
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-06-03 01:42:48 +00:00
6fa9ad7852
B2 indexer: adopt TMEM warp-to-row mapping fix
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-06-03 01:12:12 +00:00
6c92ff91f3
B2 indexer: temporary heads 0-31 only while figuring out TMEM row 32-63 layout
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-06-03 01:09:03 +00:00
7732c93f62
Fix B2 indexer: use 16x256b.x1 TMEM read with TMEM_COLS=512
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-06-03 00:59:09 +00:00
a75a9843af
Fix B2 indexer: add sLogits scratch buffer to SMEM layout
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-06-03 00:55:30 +00:00
cc7b17fdaa
Fix B2 indexer: use 2-warps for TMEM read (P7 row-slice model)
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-06-03 00:52:42 +00:00
8d0a02ca67
B2 TMEM debug: try stride=SK_TILE/8=16 for row group 32-63
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-06-03 00:50:53 +00:00
fdf702470c
Add B2 TMEM read debug kernel and test
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-06-03 00:46:51 +00:00
f1cf4c0215
Add B2 QK debug test with w_h=1 for simple comparison
First
Previous
...
6
7
8
9
10
...
Next
Last