[V1] [Hybrid] Mamba2 Automatic Prefix Caching (#25752)
Signed-off-by: Stanislaw Wozniak <stw@zurich.ibm.com> Signed-off-by: Thomas Ortner <boh@zurich.ibm.com> Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com> Co-authored-by: Thomas Ortner <boh@zurich.ibm.com> Co-authored-by: Thomas Parnell <tpa@zurich.ibm.com>
This commit is contained in:
@@ -1563,7 +1563,12 @@ class EngineArgs:
|
||||
self.enable_prefix_caching = False
|
||||
|
||||
if self.enable_prefix_caching is None:
|
||||
self.enable_prefix_caching = True
|
||||
# Disable prefix caching default for hybrid models
|
||||
# since the feature is still experimental.
|
||||
if model_config.is_hybrid:
|
||||
self.enable_prefix_caching = False
|
||||
else:
|
||||
self.enable_prefix_caching = True
|
||||
else:
|
||||
|
||||
pooling_type = model_config.pooler_config.pooling_type
|
||||
|
||||
Reference in New Issue
Block a user