biondizzle
  • Joined on 2025-12-10
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-31 04:10:53 +00:00
b8c8da91fe fix: restore RoPE functions that were lost during mHC refactor
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-31 04:07:00 +00:00
3f04a72af4 refactor: use production mHCLayer from dsv4.layers.mhc
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-31 03:57:01 +00:00
b519108cab fix: restore kv_cache.append that was accidentally removed
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-31 03:53:02 +00:00
22a89b5a45 add attention sinks to SDPA path (paper D5c)
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-31 03:45:10 +00:00
1905f19b8d fix: define q_input before USE_SDPA branch
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-31 03:42:13 +00:00
cd073ad867 use PyTorch SDPA for correctness (no sink bias in FMHA kernel yet)
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-31 03:32:18 +00:00
171a9e0d10 disable diagnostics for clean production run
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-31 03:28:54 +00:00
3f9b441428 diag: fix n_layers reference in forward_layer, add late-layer diags
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-31 03:25:57 +00:00
5b834a0599 diag: add late-layer diagnostics, fix ffn ctx variable
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-31 03:16:10 +00:00
690c0a1121 CRITICAL FIX: mHC base/scale ordering was wrong
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-31 03:12:55 +00:00
c3a2656c48 diag: add FFN and pre_block diagnostics
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-31 03:10:07 +00:00
79ba7e6636 diag: add mHC diagnostics for first 3 layers
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-31 03:04:59 +00:00
a262492e51 fix: FMHA K/V tensor shape (was permuting cache), add q_a_norm and kv_norm
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-31 02:54:49 +00:00
3f12bbc374 fix: move positions tensor to correct GPU for RoPE
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-31 02:53:42 +00:00
0c3d168c60 single_shot: stream weights per-layer from CPU, fix KV/RoPE logic
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-31 02:50:34 +00:00
61160ace13 fix: expert_weights/ids scoping in hash routing path
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-31 02:45:55 +00:00
d772885d7e single_shot_inference: proper mHC+RMSNorm+inverse RoPE pipeline
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-31 00:31:35 +00:00
523b0e47b1 Add gentle RMSNorm: only clamps when values exceed unit norm
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-31 00:27:52 +00:00
dcbb74841a Remove emergency RMSNorm from mHC post_block — MoE provides balance now
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-31 00:22:10 +00:00
1de241ccfe Fix: add all_tokens tracking for decode loop