[Doc] Update LLaVA docs (#5437)

Co-authored-by: Roger Wang <ywang@roblox.com>
2024-06-14 02:22:07 +08:00
parent 39873476f8
commit 0ce7b952f8
3 changed files with 29 additions and 38 deletions
--- a/docs/source/models/vlm.rst
+++ b/docs/source/models/vlm.rst
@@ -20,9 +20,9 @@ The following :ref:`engine arguments <engine_args>` are specific to VLMs:
    Currently, the support for vision language models on vLLM has the following limitations:

    * Only single image input is supported per text prompt.
-    * Dynamic ``image_input_shape`` is not supported: the input image will be resized to the static ``image_input_shape``. This means model output might not exactly match the HuggingFace implementation.
+    * Dynamic ``image_input_shape`` is not supported: the input image will be resized to the static ``image_input_shape``. This means our LLaVA-NeXT output may not exactly match the huggingface implementation.

-    We are continuously improving user & developer experience for VLMs. Please raise an issue on GitHub if you have any feedback or feature requests.
+    We are continuously improving user & developer experience for VLMs. Please `open an issue on GitHub <https://github.com/vllm-project/vllm/issues/new/choose>`_ if you have any feedback or feature requests.

 Offline Batched Inference
 -------------------------