[V1][VLM] Proper memory profiling for image language models (#11210)
Signed-off-by: Roger Wang <ywang@roblox.com> Co-authored-by: ywang96 <ywang@example.com>
This commit is contained in:
@@ -1280,6 +1280,14 @@ class SchedulerConfig:
|
||||
|
||||
is_multimodal_model: bool = False
|
||||
|
||||
# FIXME(woosuk & ywang96): Below are placeholder values. We need to
|
||||
# calculate the actual values from the configurations.
|
||||
# Multimodal encoder run compute budget, only used in V1
|
||||
max_num_encoder_input_tokens = 16384
|
||||
|
||||
# Multimodal encoder cache size, only used in V1
|
||||
encoder_cache_size = 16384
|
||||
|
||||
# Whether to perform preemption by swapping or
|
||||
# recomputation. If not specified, we determine the mode as follows:
|
||||
# We use recomputation by default since it incurs lower overhead than
|
||||
|
||||
Reference in New Issue
Block a user