[Docs] Add information about using shared memory in docker (#1845)

This commit is contained in:
Simon Mo
2023-11-29 18:33:56 -08:00
committed by GitHub
parent a9e4574261
commit 0f621c2c7d
2 changed files with 10 additions and 2 deletions

View File

@@ -11,12 +11,20 @@ The image is available on Docker Hub as `vllm/vllm-openai <https://hub.docker.co
$ docker run --runtime nvidia --gpus all \
-v ~/.cache/huggingface:/root/.cache/huggingface \
-p 8000:8000 \
--env "HUGGING_FACE_HUB_TOKEN=<secret>" \
-p 8000:8000 \
--ipc=host \
vllm/vllm-openai:latest \
--model mistralai/Mistral-7B-v0.1
.. note::
You can either use the ``ipc=host`` flag or ``--shm-size`` flag to allow the
container to access the host's shared memory. vLLM uses PyTorch, which uses shared
memory to share data between processes under the hood, particularly for tensor parallel inference.
You can build and run vLLM from source via the provided dockerfile. To build vLLM:
.. code-block:: console