This website requires JavaScript.
Explore
Help
Register
Sign In
biondizzle
0 Followers
·
0 Following
Joined on
2025-12-10
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.
User to block:
Optional note:
The note is not visible to the blocked user.
Cancel
Block
Repositories
25
Projects
Packages
Public Activity
Starred Repositories
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-06-03 04:57:40 +00:00
d518fcb82a
test: correct sink bias reference — denominator-only, no V contribution
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-06-03 04:53:58 +00:00
9574a9dc2e
test: add sink bias to reference SDPA in decode FMHA comparison
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-06-03 04:50:24 +00:00
9a9b347b2b
test: add per-head magnitude ratio diagnostics to decode FMHA test
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-06-03 04:47:03 +00:00
f5fa20c581
fix: syntax error — missing closing paren in indexer.forward call
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-06-03 04:46:26 +00:00
693975ec92
fix: device mismatches in decode FMHA test — dec_pos must be on per-layer GPU
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-06-03 04:39:16 +00:00
e1d96c509d
test: decode FMHA layer comparison — checks FMHA accuracy during decode step
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-06-03 04:34:33 +00:00
1ebe7f0dde
Add PART_A_NEXT_SESSION.md: clues for decode degeneration debugging
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-06-03 04:20:38 +00:00
d8306be3f2
Fix PART A test: proper FP8 quantization and MQA reference
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-06-03 04:18:26 +00:00
4126909dfb
Simplify PART A test: compressor + FMHA at production scale
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-06-03 04:15:43 +00:00
8c54cfa748
Fix KVCache init in PART A test
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-06-03 04:13:54 +00:00
04cf8ca848
Add PART A diagnostic tests: compressor + KV cache + FMHA at production scale
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-06-03 03:50:00 +00:00
75288bd12f
Wire prefill FMHA into production.py and single_shot
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-06-03 03:48:19 +00:00
5417f65b08
CRITICAL FIX: Add T-dimension strides to prefill FMHA kernel
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-06-03 03:47:02 +00:00
dd1cbe1faa
Fix smem size for prefill debug test
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-06-03 03:46:38 +00:00
09384a637a
Fix constexpr issues in prefill debug test
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-06-03 03:46:19 +00:00
d3dc8cf901
Add prefill T=2 debug CUDA test with intermediate value printing
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-06-03 03:22:52 +00:00
223c22488f
Simplify prefill PV read: use decode kernel's exact pattern
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-06-03 03:05:28 +00:00
2bf5e74e61
Add prefill debug test: compare T=1 decode vs prefill kernel step by step
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-06-03 03:01:06 +00:00
eb69c3bfb9
CRITICAL FIX: add missing tb base in QK TMEM read address
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-06-03 02:59:21 +00:00
99b6de316b
Fix prefill kernel: add missing tb base in PV TMEM read, fix ACCUMULATE for per-row PV
First
Previous
...
5
6
7
8
9
...
Next
Last