This website requires JavaScript.
Explore
Help
Register
Sign In
biondizzle
0 Followers
·
0 Following
Joined on
2025-12-10
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.
User to block:
Optional note:
The note is not visible to the blocked user.
Cancel
Block
Repositories
25
Projects
Packages
Public Activity
Starred Repositories
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-31 22:52:41 +00:00
92200367f3
FMHA kernel fix: N_orig vs N_padded — correct softmax masking for seq_len < 128
d40821c843
single_shot: fix memory (no double-loading MoE weights), FMHA short-seq fallback
Compare 2 commits »
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-31 22:45:55 +00:00
91568e12d4
single_shot_inference.py: production kernel stack version
fb96c34b89
rename: single_shot_inference.py → single_shot_PYTORCH_REFERENCE.py
Compare 2 commits »
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-31 22:30:37 +00:00
79d1a83348
Add NEXT_STEPS.md: post v0.1 issues, kernel migration plan, lessons learned
biondizzle
pushed tag
v0.1-e2e-working
to
biondizzle/nvfp4-megamoe-kernel
2026-05-31 22:27:27 +00:00
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-31 22:03:56 +00:00
acc20dffd7
CRITICAL FIX: don't fold input_scale into NVFP4 weight dequant
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-31 21:57:48 +00:00
4e64acbb64
fix MoE gate BF16/NVFP4 handling, add attention diagnostics
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-31 21:54:57 +00:00
0d2b5ceb93
fix positions device mismatch: move to rope cache device in forward_attention
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-31 21:52:00 +00:00
2676476013
fix mHC pre_block bmm dtype mismatch: A is FP32, X is BF16
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-31 21:49:01 +00:00
eb08cd06d1
Rewrite single_shot_inference.py: correct weight keys, NVFP4 two-level scale, compressor+indexer connected
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-31 21:42:53 +00:00
4988e77179
probe key format
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-31 21:41:32 +00:00
ba915dbd53
add probe_shapes script
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-31 21:38:52 +00:00
c54dd15550
find hc keys
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-31 21:36:29 +00:00
52b4971711
Full E2E single-shot: compressor, indexer, correct checkpoint keys (layers.{li}.attn/ffn)
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-31 21:26:06 +00:00
cec17fee7d
fixed prefix
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-31 21:25:41 +00:00
696f3261ab
focused key dump
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-31 21:25:00 +00:00
b7c9bb1262
dump all keys
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-31 21:24:37 +00:00
54e2a3684a
filter expert keys
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-31 21:24:18 +00:00
bafabda01f
add checkpoint key dump script
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-31 21:13:22 +00:00
23f1cf4065
Fix HcHead: use FP32 for RMSNorm + linear (matches HF reference)
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-31 21:13:07 +00:00
274ea13251
Fix critical bug: add hc_head for final mHC readout (was using stream 0)
First
Previous
...
22
23
24
25
26
...
Next
Last