[Model] Mamba2 causal conv1d Refactor to Split Prefill and Decode Requests for Corresponding Kernels (#17146)

Signed-off-by: Chih-Chieh-Yang <7364402+cyang49@users.noreply.github.com>
This commit is contained in:
Chih-Chieh Yang
2025-05-06 20:59:30 -04:00
committed by GitHub
parent 6de3e13413
commit 18dd5e01f2
8 changed files with 151 additions and 123 deletions

View File

@@ -313,7 +313,6 @@ class BambaModel(nn.Module):
mamba2_metadata = prepare_mamba2_metadata(
chunk_size=self.config.mamba_chunk_size,
input_ids=input_ids,
attn_metadata=attn_metadata,
)