[VLM][Core] Support profiling with multiple multi-modal inputs per prompt (#7126)

2024-08-15 01:55:42 +08:00
parent 70b746efcf
commit 3f674a49b5
38 changed files with 572 additions and 216 deletions
--- a/tests/models/test_phi3v.py
+++ b/tests/models/test_phi3v.py
@@ -73,7 +73,7 @@ def run_test(
    All the image fixtures for the test is under tests/images.
    For huggingface runner, we provide the PIL images as input.
    For vllm runner, we provide MultiModalDataDict objects 
-    and corresponding vision language config as input.
+    and corresponding MultiModalConfig as input.
    Note, the text input is also adjusted to abide by vllm contract.
    The text output is sanitized to be able to compare with hf.
    """