diff --git a/docs/source/index.rst b/docs/source/index.rst
index 6fbfd01d7..6420b98e5 100644
--- a/docs/source/index.rst
+++ b/docs/source/index.rst
@@ -62,6 +62,7 @@ Documentation
:caption: Serving
serving/distributed_serving
+ serving/run_on_sky
.. toctree::
:maxdepth: 1
diff --git a/docs/source/serving/run_on_sky.rst b/docs/source/serving/run_on_sky.rst
new file mode 100644
index 000000000..460aaf7ae
--- /dev/null
+++ b/docs/source/serving/run_on_sky.rst
@@ -0,0 +1,69 @@
+.. _on_cloud:
+
+Running on clouds with SkyPilot
+===============================
+
+.. raw:: html
+
+
+
+
+
+vLLM can be run on the cloud to scale to multiple GPUs with `SkyPilot `__, an open-source framework for running LLMs on any cloud.
+
+To install SkyPilot and setup your cloud credentials, run:
+
+.. code-block:: console
+
+ $ pip install skypilot
+ $ sky check
+
+See the vLLM SkyPilot YAML for serving, `serving.yaml `__.
+
+.. code-block:: yaml
+
+ resources:
+ accelerators: A100
+
+ envs:
+ MODEL_NAME: decapoda-research/llama-13b-hf
+ TOKENIZER: hf-internal-testing/llama-tokenizer
+
+ setup: |
+ conda create -n vllm python=3.9 -y
+ conda activate vllm
+ git clone https://github.com/vllm-project/vllm.git
+ cd vllm
+ pip install .
+ pip install gradio
+
+ run: |
+ conda activate vllm
+ echo 'Starting vllm api server...'
+ python -u -m vllm.entrypoints.api_server \
+ --model $MODEL_NAME \
+ --tensor-parallel-size $SKYPILOT_NUM_GPUS_PER_NODE \
+ --tokenizer $TOKENIZER 2>&1 | tee api_server.log &
+ echo 'Waiting for vllm api server to start...'
+ while ! `cat api_server.log | grep -q 'Uvicorn running on'`; do sleep 1; done
+ echo 'Starting gradio server...'
+ python vllm/examples/gradio_webserver.py
+
+Start the serving the LLaMA-13B model on an A100 GPU:
+
+.. code-block:: console
+
+ $ sky launch serving.yaml
+
+Check the output of the command. There will be a sharable gradio link (like the last line of the following). Open it in your browser to use the LLaMA model to do the text completion.
+
+.. code-block:: console
+
+ (task, pid=7431) Running on public URL: https://.gradio.live
+
+**Optional**: Serve the 65B model instead of the default 13B and use more GPU:
+
+.. code-block:: console
+
+ sky launch -c vllm-serve-new -s serve.yaml --gpus A100:8 --env MODEL_NAME=decapoda-research/llama-65b-hf
+