[Doc] Indicate more information about supported modalities (#8181)

This commit is contained in:
Cyrus Leung
2024-09-05 18:51:53 +08:00
committed by GitHub
parent e39ebf5cf5
commit 288a938872
7 changed files with 207 additions and 52 deletions

View File

@@ -21,7 +21,7 @@ If you have already taken care of the above issues, but the vLLM instance still
With more logging, hopefully you can find the root cause of the issue.
If it crashes, and the error trace shows somewhere around ``self.graph.replay()`` in ``vllm/worker/model_runner.py``, it is a cuda error inside cudagraph. To know the particular cuda operation that causes the error, you can add ``--enforce-eager`` to the command line, or ``enforce_eager=True`` to the ``LLM`` class, to disable the cudagraph optimization. This way, you can locate the exact cuda operation that causes the error.
If it crashes, and the error trace shows somewhere around ``self.graph.replay()`` in ``vllm/worker/model_runner.py``, it is a cuda error inside cudagraph. To know the particular cuda operation that causes the error, you can add ``--enforce-eager`` to the command line, or ``enforce_eager=True`` to the :class:`~vllm.LLM` class, to disable the cudagraph optimization. This way, you can locate the exact cuda operation that causes the error.
Here are some common issues that can cause hangs: