[Core][MM] Optimize encoder cache manager by operating with embeddings only (#30475)

Signed-off-by: Roger Wang <hey@rogerw.io>
Co-authored-by: Sun Kim <sunytokki@gmail.com>
(cherry picked from commit f5f51e5931)
This commit is contained in:
Roger Wang
2025-12-16 14:18:17 -08:00
committed by Kevin H. Luu
parent e397bd6592
commit 16484d394c
14 changed files with 306 additions and 130 deletions

View File

@@ -38,7 +38,7 @@ class MockRequest:
)
self.mm_features.append(feature)
def get_num_encoder_tokens(self, input_id: int) -> int:
def get_num_encoder_embeds(self, input_id: int) -> int:
assert input_id < len(self._token_counts)
return self._token_counts[input_id]