Files
biondizzle ae26f6b83c Fix dense router BF16 dispatch: use torch.matmul instead of F.linear
- F.linear(x, W) computes x @ W.T which caused shape mismatch when
  W_gate was pre-transposed to [E, H]
- Use torch.matmul(x, W_gate) instead — computes x @ W directly, no
  transpose needed, no FP32 conversion, fully graph-capturable
- W_gate stays as [H, E] (original checkpoint shape)
2026-06-04 05:58:24 +00:00
..