Files
biondizzle 3320abfe24 Fix two correctness bugs: compressor pos bias on KV + SwiGLU clamp ordering
1. Compressor positional bias was being added to BOTH gate (softmax logit)
   AND KV content. Per paper eq. 9-12, position bias is only for the
   softmax logits (Z+B), NOT the KV content (C). Adding pb to kv_val
   corrupts every compressed KV entry with learned positional-bias content.
   Fixed in both CSA and HCA paths in compressor_reduce.cu.

2. SwiGLU clamp ordering: code was clamping silu(gate) instead of clamping
   raw gate before SiLU. Per paper §4.2.3: gate = clamp(gate, max=limit),
   then silu(clamp(gate)) * clamp(up). Fixed in moe.py (both unfused
   paths) and fused_swiglu.py (CuTeDSL kernel). shared_expert.py was
   already correct.
2026-06-03 11:17:49 +00:00
..