[TPU] Reduce compilation time & Upgrade PyTorch XLA version (#6856)

2024-07-27 10:28:33 -07:00
parent f954d0715c
commit fad5576c58
6 changed files with 24 additions and 7 deletions
--- a/docs/source/getting_started/tpu-installation.rst
+++ b/docs/source/getting_started/tpu-installation.rst
@@ -56,7 +56,7 @@ First, install the dependencies:
    $ pip uninstall torch torch-xla -y

    $ # Install PyTorch and PyTorch XLA.
-    $ export DATE="+20240713"
+    $ export DATE="+20240726"
    $ pip install https://storage.googleapis.com/pytorch-xla-releases/wheels/tpuvm/torch-nightly${DATE}-cp310-cp310-linux_x86_64.whl
    $ pip install https://storage.googleapis.com/pytorch-xla-releases/wheels/tpuvm/torch_xla-nightly${DATE}-cp310-cp310-linux_x86_64.whl

@@ -75,6 +75,13 @@ Next, build vLLM from source. This will only take a few seconds:
    $ VLLM_TARGET_DEVICE="tpu" python setup.py develop


+.. note::
+
+    Since TPU relies on XLA which requires static shapes, vLLM bucketizes the possible input shapes and compiles an XLA graph for each different shape.
+    The compilation time may take 20~30 minutes in the first run.
+    However, the compilation time reduces to ~5 minutes afterwards because the XLA graphs are cached in the disk (in :code:`VLLM_XLA_CACHE_PATH` or :code:`~/.cache/vllm/xla_cache` by default).
+
+
 .. tip::

    If you encounter the following error: