[Core][Multimodal] Track encode cache entries by mm_hash and enable embedding sharing between requests (#22711)
Signed-off-by: knlnguyen1802 <knlnguyen1802@gmail.com> Signed-off-by: Roger Wang <hey@rogerw.io> Co-authored-by: knlnguyen1802 <knlnguyen1802@gmail.com> Co-authored-by: Roger Wang <hey@rogerw.io>
This commit is contained in:
@@ -205,6 +205,7 @@ def _construct_cached_request_state(req_id_suffix: int):
|
||||
pooling_params=None,
|
||||
mm_kwargs=[],
|
||||
mm_positions=[],
|
||||
mm_hashes=[],
|
||||
block_ids=([], ),
|
||||
generator=None,
|
||||
num_computed_tokens=len(output_token_ids),
|
||||
|
||||
Reference in New Issue
Block a user