Optimize Mixtral with expert parallelism (#2090)

This commit is contained in:
Antoni Baum
2023-12-13 23:55:07 -08:00
committed by GitHub
parent f1c8520146
commit 21d93c140d
6 changed files with 230 additions and 343 deletions

View File

@@ -74,8 +74,7 @@ Otherwise, please refer to :ref:`Adding a New Model <adding_a_new_model>` for in
Alternatively, you can raise an issue on our `GitHub <https://github.com/vllm-project/vllm/issues>`_ project.
.. note::
Currently, the ROCm version of vLLM does not support Mixtral.
Additionally, it only supports Mistral for context lengths up to 4096.
Currently, the ROCm version of vLLM supports Mistral and Mixtral only for context lengths up to 4096.
.. tip::
The easiest way to check if your model is supported is to run the program below: