[V1][VLM] Proper memory profiling for image language models (#11210)

Signed-off-by: Roger Wang <ywang@roblox.com> Co-authored-by: ywang96 <ywang@example.com>
2024-12-16 22:10:57 -08:00
parent 66d4b16724
commit 59c9b6ebeb
6 changed files with 98 additions and 13 deletions
--- a/vllm/config.py
+++ b/vllm/config.py
@@ -1280,6 +1280,14 @@ class SchedulerConfig:

    is_multimodal_model: bool = False

+    # FIXME(woosuk & ywang96): Below are placeholder values. We need to
+    # calculate the actual values from the configurations.
+    # Multimodal encoder run compute budget, only used in V1
+    max_num_encoder_input_tokens = 16384
+
+    # Multimodal encoder cache size, only used in V1
+    encoder_cache_size = 16384
+
    # Whether to perform preemption by swapping or
    # recomputation. If not specified, we determine the mode as follows:
    # We use recomputation by default since it incurs lower overhead than