single_shot_inference: proper mHC+RMSNorm+inverse RoPE pipeline

Major rewrite of single_shot_inference.py:
- Replace broken mHC (gentle normalization hack) with proper Sinkhorn-Knopp
- Add RMSNorm before each sub-block (attention + FFN)
- Add inverse RoPE on attention output (paper §2.3.3)
- Fix KV cache: RoPE applied before caching, K=V in DSV4 MQA
- Fix MoE: proper dense routing with e_bias, SwiGLU clamping
- Proper weight mapping: fn→W_stacked, base→S_pre/S_res/S_post, scale→alphas
- Add identity mHC fallback when weights missing
- No emergency normalization, no bandaids
This commit is contained in:
2026-05-31 02:45:52 +00:00
parent 523b0e47b1
commit d772885d7e
2 changed files with 588 additions and 303 deletions