[V1][VLM] V1 support for selected single-image models. (#11632)
Signed-off-by: Roger Wang <ywang@roblox.com> Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: Isotr0py <2037008807@qq.com> Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk> Co-authored-by: Isotr0py <2037008807@qq.com>
This commit is contained in:
@@ -647,10 +647,23 @@ class GPUModelRunner:
|
||||
self.mm_registry.get_max_tokens_per_item_by_modality(
|
||||
self.model_config).values())
|
||||
|
||||
max_num_mm_items = min(
|
||||
max_num_mm_items_encoder_budget = min(
|
||||
self.max_num_encoder_input_tokens,
|
||||
self.encoder_cache_size) // max_tokens_per_mm_item
|
||||
|
||||
max_mm_items_per_req = max(
|
||||
self.mm_registry.get_mm_limits_per_prompt(
|
||||
self.model_config).values())
|
||||
|
||||
# NOTE: We do not consider max_num_batched_tokens on purpose
|
||||
# because the multimodal embeddings can be generated in advance
|
||||
# and chunked prefilled.
|
||||
max_num_mm_items_decoder_budget = self.max_num_reqs * \
|
||||
max_mm_items_per_req
|
||||
|
||||
max_num_mm_items = min(max_num_mm_items_encoder_budget,
|
||||
max_num_mm_items_decoder_budget)
|
||||
|
||||
# Dummy data definition in V0 may contain multiple multimodal items
|
||||
# (e.g, multiple images) for a single request, therefore here we
|
||||
# always replicate first item by max_num_mm_items times since in V1
|
||||
|
||||
Reference in New Issue
Block a user