[Hybrid][torch.compile] Refactor mamba2 forward to avoid obscuring linear projections under custom op (#28587)
Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>
This commit is contained in:
@@ -138,8 +138,7 @@ class BambaMixerDecoderLayer(nn.Module):
|
||||
else:
|
||||
hidden_states, residual = self.input_layernorm(hidden_states, residual)
|
||||
|
||||
output = torch.empty_like(hidden_states)
|
||||
self.mamba(hidden_states, output)
|
||||
output = self.mamba(hidden_states)
|
||||
# Fully Connected
|
||||
hidden_states, residual = self.pre_ff_layernorm(output, residual)
|
||||
hidden_states = self.feed_forward(hidden_states)
|
||||
|
||||
Reference in New Issue
Block a user