debug: disable inverse RoPE to check impact on output

This commit is contained in:
2026-05-31 04:40:34 +00:00
parent c69dc51b3b
commit 92e465ca04

View File

@@ -405,7 +405,9 @@ def forward_layer(X_l, w, li, cfg, rope_cos, rope_sin,
attn_out = attn_out.permute(1, 0, 2) # (T, n_h, hd)
# -- Inverse RoPE on attention output (paper §2.3.3) --
attn_out = apply_inverse_rope(attn_out, positions_dev, rope_cos, rope_sin, hd, rd)
# NOTE: disabling for debugging — check if this is causing issues
# attn_out = apply_inverse_rope(attn_out, positions_dev, rope_cos, rope_sin, hd, rd)
attn_out = attn_out # No inverse RoPE for now
# -- Output projection: wo_a (grouped BMM) + wo_b (NVFP4) --
# wo_a: grouped linear, (n_h, hd) → (n_groups, o_rank) via BMM