biondizzle
  • Joined on 2025-12-10
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-31 22:52:41 +00:00
92200367f3 FMHA kernel fix: N_orig vs N_padded — correct softmax masking for seq_len < 128
d40821c843 single_shot: fix memory (no double-loading MoE weights), FMHA short-seq fallback
Compare 2 commits »
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-31 22:45:55 +00:00
91568e12d4 single_shot_inference.py: production kernel stack version
fb96c34b89 rename: single_shot_inference.py → single_shot_PYTORCH_REFERENCE.py
Compare 2 commits »
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-31 22:30:37 +00:00
79d1a83348 Add NEXT_STEPS.md: post v0.1 issues, kernel migration plan, lessons learned
biondizzle pushed tag v0.1-e2e-working to biondizzle/nvfp4-megamoe-kernel 2026-05-31 22:27:27 +00:00
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-31 22:03:56 +00:00
acc20dffd7 CRITICAL FIX: don't fold input_scale into NVFP4 weight dequant
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-31 21:57:48 +00:00
4e64acbb64 fix MoE gate BF16/NVFP4 handling, add attention diagnostics
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-31 21:54:57 +00:00
0d2b5ceb93 fix positions device mismatch: move to rope cache device in forward_attention
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-31 21:52:00 +00:00
2676476013 fix mHC pre_block bmm dtype mismatch: A is FP32, X is BF16
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-31 21:49:01 +00:00
eb08cd06d1 Rewrite single_shot_inference.py: correct weight keys, NVFP4 two-level scale, compressor+indexer connected
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-31 21:42:53 +00:00
4988e77179 probe key format
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-31 21:41:32 +00:00
ba915dbd53 add probe_shapes script
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-31 21:38:52 +00:00
c54dd15550 find hc keys
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-31 21:36:29 +00:00
52b4971711 Full E2E single-shot: compressor, indexer, correct checkpoint keys (layers.{li}.attn/ffn)
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-31 21:26:06 +00:00
cec17fee7d fixed prefix
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-31 21:25:41 +00:00
696f3261ab focused key dump
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-31 21:25:00 +00:00
b7c9bb1262 dump all keys
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-31 21:24:37 +00:00
54e2a3684a filter expert keys
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-31 21:24:18 +00:00
bafabda01f add checkpoint key dump script
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-31 21:13:22 +00:00
23f1cf4065 Fix HcHead: use FP32 for RMSNorm + linear (matches HF reference)
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-31 21:13:07 +00:00
274ea13251 Fix critical bug: add hc_head for final mHC readout (was using stream 0)