[V1][VLM] Proper memory profiling for image language models (#11210)

Signed-off-by: Roger Wang <ywang@roblox.com>
Co-authored-by: ywang96 <ywang@example.com>
This commit is contained in:
Roger Wang
2024-12-16 22:10:57 -08:00
committed by GitHub
parent 66d4b16724
commit 59c9b6ebeb
6 changed files with 98 additions and 13 deletions

View File

@@ -1280,6 +1280,14 @@ class SchedulerConfig:
is_multimodal_model: bool = False
# FIXME(woosuk & ywang96): Below are placeholder values. We need to
# calculate the actual values from the configurations.
# Multimodal encoder run compute budget, only used in V1
max_num_encoder_input_tokens = 16384
# Multimodal encoder cache size, only used in V1
encoder_cache_size = 16384
# Whether to perform preemption by swapping or
# recomputation. If not specified, we determine the mode as follows:
# We use recomputation by default since it incurs lower overhead than