[Bugfix / Core] Prefix Caching Guards (merged with main) (#4846)

Co-authored-by: rsnm2 <rshaw@neuralmagic.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-neuralmagic@users.noreply.github.com>
This commit is contained in:
Zhuohan Li
2024-05-27 15:18:17 -07:00
committed by GitHub
parent f17a1a8f96
commit 1102bef219
11 changed files with 167 additions and 44 deletions

View File

@@ -74,7 +74,6 @@ class Starcoder2Attention(nn.Module):
self.rope_theta = config.rope_theta
self.max_position_embeddings = config.max_position_embeddings
self.use_bias = config.use_bias
self.sliding_window = config.sliding_window
self.qkv_proj = QKVParallelLinear(
self.hidden_size,
@@ -101,7 +100,6 @@ class Starcoder2Attention(nn.Module):
self.head_dim,
self.scaling,
num_kv_heads=self.num_kv_heads,
sliding_window=self.sliding_window,
cache_config=cache_config,
quant_config=quant_config)