[Doc] Indicate more information about supported modalities (#8181)
This commit is contained in:
@@ -21,7 +21,7 @@ If you have already taken care of the above issues, but the vLLM instance still
|
||||
|
||||
With more logging, hopefully you can find the root cause of the issue.
|
||||
|
||||
If it crashes, and the error trace shows somewhere around ``self.graph.replay()`` in ``vllm/worker/model_runner.py``, it is a cuda error inside cudagraph. To know the particular cuda operation that causes the error, you can add ``--enforce-eager`` to the command line, or ``enforce_eager=True`` to the ``LLM`` class, to disable the cudagraph optimization. This way, you can locate the exact cuda operation that causes the error.
|
||||
If it crashes, and the error trace shows somewhere around ``self.graph.replay()`` in ``vllm/worker/model_runner.py``, it is a cuda error inside cudagraph. To know the particular cuda operation that causes the error, you can add ``--enforce-eager`` to the command line, or ``enforce_eager=True`` to the :class:`~vllm.LLM` class, to disable the cudagraph optimization. This way, you can locate the exact cuda operation that causes the error.
|
||||
|
||||
Here are some common issues that can cause hangs:
|
||||
|
||||
|
||||
@@ -24,7 +24,9 @@ Offline Batched Inference
|
||||
|
||||
We first show an example of using vLLM for offline batched inference on a dataset. In other words, we use vLLM to generate texts for a list of input prompts.
|
||||
|
||||
Import ``LLM`` and ``SamplingParams`` from vLLM. The ``LLM`` class is the main class for running offline inference with vLLM engine. The ``SamplingParams`` class specifies the parameters for the sampling process.
|
||||
Import :class:`~vllm.LLM` and :class:`~vllm.SamplingParams` from vLLM.
|
||||
The :class:`~vllm.LLM` class is the main class for running offline inference with vLLM engine.
|
||||
The :class:`~vllm.SamplingParams` class specifies the parameters for the sampling process.
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
@@ -42,7 +44,7 @@ Define the list of input prompts and the sampling parameters for generation. The
|
||||
]
|
||||
sampling_params = SamplingParams(temperature=0.8, top_p=0.95)
|
||||
|
||||
Initialize vLLM's engine for offline inference with the ``LLM`` class and the `OPT-125M model <https://arxiv.org/abs/2205.01068>`_. The list of supported models can be found at :ref:`supported models <supported_models>`.
|
||||
Initialize vLLM's engine for offline inference with the :class:`~vllm.LLM` class and the `OPT-125M model <https://arxiv.org/abs/2205.01068>`_. The list of supported models can be found at :ref:`supported models <supported_models>`.
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
|
||||
Reference in New Issue
Block a user