biondizzle

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-05-31 22:52:41 +00:00

92200367f3 FMHA kernel fix: N_orig vs N_padded — correct softmax masking for seq_len < 128

d40821c843 single_shot: fix memory (no double-loading MoE weights), FMHA short-seq fallback

Compare 2 commits »

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-05-31 22:45:55 +00:00

91568e12d4 single_shot_inference.py: production kernel stack version

fb96c34b89 rename: single_shot_inference.py → single_shot_PYTORCH_REFERENCE.py

Compare 2 commits »

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-05-31 22:30:37 +00:00

79d1a83348 Add NEXT_STEPS.md: post v0.1 issues, kernel migration plan, lessons learned

biondizzle pushed tag v0.1-e2e-working to biondizzle/nvfp4-megamoe-kernel

2026-05-31 22:27:27 +00:00

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-05-31 22:03:56 +00:00

acc20dffd7 CRITICAL FIX: don't fold input_scale into NVFP4 weight dequant

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-05-31 21:57:48 +00:00

4e64acbb64 fix MoE gate BF16/NVFP4 handling, add attention diagnostics

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-05-31 21:54:57 +00:00

0d2b5ceb93 fix positions device mismatch: move to rope cache device in forward_attention

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-05-31 21:52:00 +00:00

2676476013 fix mHC pre_block bmm dtype mismatch: A is FP32, X is BF16

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-05-31 21:49:01 +00:00

eb08cd06d1 Rewrite single_shot_inference.py: correct weight keys, NVFP4 two-level scale, compressor+indexer connected

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-05-31 21:42:53 +00:00

4988e77179 probe key format

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-05-31 21:41:32 +00:00

ba915dbd53 add probe_shapes script

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-05-31 21:38:52 +00:00

c54dd15550 find hc keys

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-05-31 21:36:29 +00:00

52b4971711 Full E2E single-shot: compressor, indexer, correct checkpoint keys (layers.{li}.attn/ffn)

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-05-31 21:26:06 +00:00

cec17fee7d fixed prefix

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-05-31 21:25:41 +00:00

696f3261ab focused key dump

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-05-31 21:25:00 +00:00

b7c9bb1262 dump all keys

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-05-31 21:24:37 +00:00

54e2a3684a filter expert keys

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-05-31 21:24:18 +00:00

bafabda01f add checkpoint key dump script

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-05-31 21:13:22 +00:00

23f1cf4065 Fix HcHead: use FP32 for RMSNorm + linear (matches HF reference)

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-05-31 21:13:07 +00:00

274ea13251 Fix critical bug: add hc_head for final mHC readout (was using stream 0)