[Core][Multimodal] Track encode cache entries by mm_hash and enable embedding sharing between requests (#22711)

Signed-off-by: knlnguyen1802 <knlnguyen1802@gmail.com>
Signed-off-by: Roger Wang <hey@rogerw.io>
Co-authored-by: knlnguyen1802 <knlnguyen1802@gmail.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
This commit is contained in:
Chenguang Zheng
2025-08-25 15:41:17 +08:00
committed by GitHub
parent 712d0f88d8
commit d765cf01fe
12 changed files with 365 additions and 154 deletions

View File

@@ -205,6 +205,7 @@ def _construct_cached_request_state(req_id_suffix: int):
pooling_params=None,
mm_kwargs=[],
mm_positions=[],
mm_hashes=[],
block_ids=([], ),
generator=None,
num_computed_tokens=len(output_token_ids),