docs/getting_started/installation/google_tpu.md

# Google TPU

Tensor Processing Units (TPUs) are Google's custom-developed application-specific
integrated circuits (ASICs) used to accelerate machine learning workloads. TPUs
are available in different versions each with different hardware specifications.
For more information about TPUs, see [TPU System Architecture](https://cloud.google.com/tpu/docs/system-architecture-tpu-vm).
For more information on the TPU versions supported with vLLM, see:

- [TPU v6e](https://cloud.google.com/tpu/docs/v6e)
- [TPU v5e](https://cloud.google.com/tpu/docs/v5e)
- [TPU v5p](https://cloud.google.com/tpu/docs/v5p)
- [TPU v4](https://cloud.google.com/tpu/docs/v4)

These TPU versions allow you to configure the physical arrangements of the TPU
chips. This can improve throughput and networking performance. For more
information see:

- [TPU v6e topologies](https://cloud.google.com/tpu/docs/v6e#configurations)
- [TPU v5e topologies](https://cloud.google.com/tpu/docs/v5e#tpu-v5e-config)
- [TPU v5p topologies](https://cloud.google.com/tpu/docs/v5p#tpu-v5p-config)
- [TPU v4 topologies](https://cloud.google.com/tpu/docs/v4#tpu-v4-config)

In order for you to use Cloud TPUs you need to have TPU quota granted to your
Google Cloud Platform project. TPU quotas specify how many TPUs you can use in a
GPC project and are specified in terms of TPU version, the number of TPU you
want to use, and quota type. For more information, see [TPU quota](https://cloud.google.com/tpu/docs/quota#tpu_quota).

For TPU pricing information, see [Cloud TPU pricing](https://cloud.google.com/tpu/pricing).

You may need additional persistent storage for your TPU VMs. For more
information, see [Storage options for Cloud TPU data](https://cloud.devsite.corp.google.com/tpu/docs/storage-options).

!!! warning
    There are no pre-built wheels for this device, so you must either use the pre-built Docker image or build vLLM from source.

## Requirements

- Google Cloud TPU VM
- TPU versions: v6e, v5e, v5p, v4
- Python: 3.11 or newer

### Provision Cloud TPUs

You can provision Cloud TPUs using the [Cloud TPU API](https://cloud.google.com/tpu/docs/reference/rest)
or the [queued resources](https://cloud.google.com/tpu/docs/queued-resources)
API (preferred). This section shows how to create TPUs using the queued resource API. For
more information about using the Cloud TPU API, see [Create a Cloud TPU using the Create Node API](https://cloud.google.com/tpu/docs/managing-tpus-tpu-vm#create-node-api).
Queued resources enable you to request Cloud TPU resources in a queued manner.
When you request queued resources, the request is added to a queue maintained by
the Cloud TPU service. When the requested resource becomes available, it's
assigned to your Google Cloud project for your immediate exclusive use.

!!! note
    In all of the following commands, replace the ALL CAPS parameter names with
    appropriate values. See the parameter descriptions table for more information.

### Provision Cloud TPUs with GKE

For more information about using TPUs with GKE, see:

- [About TPUs in GKE](https://cloud.google.com/kubernetes-engine/docs/concepts/tpus)
- [Deploy TPU workloads in GKE Standard](https://cloud.google.com/kubernetes-engine/docs/how-to/tpus)
- [Plan for TPUs in GKE](https://cloud.google.com/kubernetes-engine/docs/concepts/plan-tpus)

## Configure a new environment

### Provision a Cloud TPU with the queued resource API

Create a TPU v5e with 4 TPU chips:

```bash
gcloud alpha compute tpus queued-resources create QUEUED_RESOURCE_ID \
  --node-id TPU_NAME \
  --project PROJECT_ID \
  --zone ZONE \
  --accelerator-type ACCELERATOR_TYPE \
  --runtime-version RUNTIME_VERSION \
  --service-account SERVICE_ACCOUNT
```

| Parameter name     | Description                                                                                                                                                                                              |
|--------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| QUEUED_RESOURCE_ID | The user-assigned ID of the queued resource request.                                                                                                                                                     |
| TPU_NAME           | The user-assigned name of the TPU which is created when the queued resource request is allocated.                                                                                                        |
| PROJECT_ID         | Your Google Cloud project                                                                                                                                                                                |
| ZONE               | The GCP zone where you want to create your Cloud TPU. The value you use depends on the version of TPUs you are using. For more information, see [TPU regions and zones]                                  |
| ACCELERATOR_TYPE   | The TPU version you want to use. Specify the TPU version, for example `v5litepod-4` specifies a v5e TPU with 4 cores, `v6e-1` specifies a v6e TPU with 1 core. For more information, see [TPU versions]. |
| RUNTIME_VERSION    | The TPU VM runtime version to use. For example, use `v2-alpha-tpuv6e` for a VM loaded with one or more v6e TPU(s).                                              |
| SERVICE_ACCOUNT    | The email address for your service account. You can find it in the IAM Cloud Console under *Service Accounts*. For example: `tpu-service-account@<your_project_ID>.iam.gserviceaccount.com`              |

Connect to your TPU VM using SSH:

```bash
gcloud compute tpus tpu-vm ssh TPU_NAME --project PROJECT_ID --zone ZONE
```

!!! note
    When configuring `RUNTIME_VERSION` ("TPU software version") on GCP, ensure it matches the TPU generation you've selected by referencing the [TPU VM images] compatibility matrix. Using an incompatible version may prevent vLLM from running correctly.

[TPU versions]: https://cloud.google.com/tpu/docs/runtimes
[TPU VM images]: https://cloud.google.com/tpu/docs/runtimes
[TPU regions and zones]: https://cloud.google.com/tpu/docs/regions-zones

## Set up using Python

### Pre-built wheels

Currently, there are no pre-built TPU wheels.

### Build wheel from source

Install Miniconda:

```bash
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh
source ~/.bashrc
```

Create and activate a Conda environment for vLLM:

```bash
conda create -n vllm python=3.12 -y
conda activate vllm
```

Clone the vLLM repository and go to the vLLM directory:

```bash
git clone https://github.com/vllm-project/vllm.git && cd vllm
```

Uninstall the existing `torch` and `torch_xla` packages:

```bash
pip uninstall torch torch-xla -y
```

Install build dependencies:

```bash
pip install -r requirements/tpu.txt
sudo apt-get install --no-install-recommends --yes libopenblas-base libopenmpi-dev libomp-dev
```

Run the setup script:

```bash
VLLM_TARGET_DEVICE="tpu" python -m pip install -e .
```

## Set up using Docker

### Pre-built images

See [deployment-docker-pre-built-image][deployment-docker-pre-built-image] for instructions on using the official Docker image, making sure to substitute the image name `vllm/vllm-openai` with `vllm/vllm-tpu`.

### Build image from source

You can use <gh-file:docker/Dockerfile.tpu> to build a Docker image with TPU support.

```bash
docker build -f docker/Dockerfile.tpu -t vllm-tpu .
```

Run the Docker image with the following command:

```bash
# Make sure to add `--privileged --net host --shm-size=16G`.
docker run --privileged --net host --shm-size=16G -it vllm-tpu
```

!!! note
    Since TPU relies on XLA which requires static shapes, vLLM bucketizes the
    possible input shapes and compiles an XLA graph for each shape. The
    compilation time may take 20~30 minutes in the first run. However, the
    compilation time reduces to ~5 minutes afterwards because the XLA graphs are
    cached in the disk (in `VLLM_XLA_CACHE_PATH` or `~/.cache/vllm/xla_cache` by default).

!!! tip
    If you encounter the following error:

    ```console
    from torch._C import *  # noqa: F403
    ImportError: libopenblas.so.0: cannot open shared object file: No such
    file or directory
    ```

    Install OpenBLAS with the following command:

    ```bash
    sudo apt-get install --no-install-recommends --yes libopenblas-base libopenmpi-dev libomp-dev
    ```
[doc] split "Other AI Accelerators" tabs (#19708) 2025-06-17 09:05:29 -04:00			`# Google TPU`
[Docs] Convert rST to MyST (Markdown) (#11145) Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com> 2024-12-23 17:35:38 -05:00
			`Tensor Processing Units (TPUs) are Google's custom-developed application-specific`
			`integrated circuits (ASICs) used to accelerate machine learning workloads. TPUs`
			`are available in different versions each with different hardware specifications.`
			`For more information about TPUs, see [TPU System Architecture](https://cloud.google.com/tpu/docs/system-architecture-tpu-vm).`
			`For more information on the TPU versions supported with vLLM, see:`

			`- [TPU v6e](https://cloud.google.com/tpu/docs/v6e)`
			`- [TPU v5e](https://cloud.google.com/tpu/docs/v5e)`
			`- [TPU v5p](https://cloud.google.com/tpu/docs/v5p)`
			`- [TPU v4](https://cloud.google.com/tpu/docs/v4)`

			`These TPU versions allow you to configure the physical arrangements of the TPU`
			`chips. This can improve throughput and networking performance. For more`
			`information see:`

			`- [TPU v6e topologies](https://cloud.google.com/tpu/docs/v6e#configurations)`
			`- [TPU v5e topologies](https://cloud.google.com/tpu/docs/v5e#tpu-v5e-config)`
			`- [TPU v5p topologies](https://cloud.google.com/tpu/docs/v5p#tpu-v5p-config)`
			`- [TPU v4 topologies](https://cloud.google.com/tpu/docs/v4#tpu-v4-config)`

			`In order for you to use Cloud TPUs you need to have TPU quota granted to your`
			`Google Cloud Platform project. TPU quotas specify how many TPUs you can use in a`
			`GPC project and are specified in terms of TPU version, the number of TPU you`
			`want to use, and quota type. For more information, see [TPU quota](https://cloud.google.com/tpu/docs/quota#tpu_quota).`

			`For TPU pricing information, see [Cloud TPU pricing](https://cloud.google.com/tpu/pricing).`

			`You may need additional persistent storage for your TPU VMs. For more`
			`information, see [Storage options for Cloud TPU data](https://cloud.devsite.corp.google.com/tpu/docs/storage-options).`

Migrate docs from Sphinx to MkDocs (#18145) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> 2025-05-23 11:09:53 +02:00			`!!! warning`
			`There are no pre-built wheels for this device, so you must either use the pre-built Docker image or build vLLM from source.`
[Doc] Improve installation signposting (#12575) - Make device tab names more explicit - Add comprehensive list of devices to https://docs.vllm.ai/en/latest/getting_started/installation/index.html - Add `attention` blocks to the intro of all devices that don't have pre-built wheels/images --------- Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> 2025-01-31 23:38:35 +00:00
[doc] split "Other AI Accelerators" tabs (#19708) 2025-06-17 09:05:29 -04:00			`## Requirements`
[Docs] Convert rST to MyST (Markdown) (#11145) Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com> 2024-12-23 17:35:38 -05:00
			`- Google Cloud TPU VM`
			`- TPU versions: v6e, v5e, v5p, v4`
[TPU] Start using python 3.12 (#21000) Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com> 2025-07-16 19:37:44 -07:00			`- Python: 3.11 or newer`
[Docs] Convert rST to MyST (Markdown) (#11145) Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com> 2024-12-23 17:35:38 -05:00
			`### Provision Cloud TPUs`

			`You can provision Cloud TPUs using the [Cloud TPU API](https://cloud.google.com/tpu/docs/reference/rest)`
			`or the [queued resources](https://cloud.google.com/tpu/docs/queued-resources)`
[Doc] Minor fix for the vLLM TPU setup page (#17206) Signed-off-by: Yarong Mu <ymu@google.com> 2025-04-25 21:39:56 -07:00			`API (preferred). This section shows how to create TPUs using the queued resource API. For`
[Docs] Convert rST to MyST (Markdown) (#11145) Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com> 2024-12-23 17:35:38 -05:00			`more information about using the Cloud TPU API, see [Create a Cloud TPU using the Create Node API](https://cloud.google.com/tpu/docs/managing-tpus-tpu-vm#create-node-api).`
			`Queued resources enable you to request Cloud TPU resources in a queued manner.`
			`When you request queued resources, the request is added to a queue maintained by`
			`the Cloud TPU service. When the requested resource becomes available, it's`
			`assigned to your Google Cloud project for your immediate exclusive use.`

Migrate docs from Sphinx to MkDocs (#18145) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> 2025-05-23 11:09:53 +02:00			`!!! note`
			`In all of the following commands, replace the ALL CAPS parameter names with`
			`appropriate values. See the parameter descriptions table for more information.`
[Docs] Convert rST to MyST (Markdown) (#11145) Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com> 2024-12-23 17:35:38 -05:00
[Doc] Organise installation documentation into categories and tabs (#11935) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> 2025-01-13 12:27:36 +00:00			`### Provision Cloud TPUs with GKE`

			`For more information about using TPUs with GKE, see:`
[doc] fix "Other AI accelerators" getting started page (#19457) Signed-off-by: David Xia <david@davidxia.com> 2025-06-11 12:11:17 -04:00
[Docs] Make TPU ref prettier in google_tpu.md (#20356) Signed-off-by: windsonsea <haifeng.yao@daocloud.io> 2025-07-02 17:04:08 +08:00			`- [About TPUs in GKE](https://cloud.google.com/kubernetes-engine/docs/concepts/tpus)`
			`- [Deploy TPU workloads in GKE Standard](https://cloud.google.com/kubernetes-engine/docs/how-to/tpus)`
			`- [Plan for TPUs in GKE](https://cloud.google.com/kubernetes-engine/docs/concepts/plan-tpus)`
[Doc] Organise installation documentation into categories and tabs (#11935) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> 2025-01-13 12:27:36 +00:00
[doc] split "Other AI Accelerators" tabs (#19708) 2025-06-17 09:05:29 -04:00			`## Configure a new environment`
[Doc] Organise installation documentation into categories and tabs (#11935) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> 2025-01-13 12:27:36 +00:00
			`### Provision a Cloud TPU with the queued resource API`
[Docs] Convert rST to MyST (Markdown) (#11145) Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com> 2024-12-23 17:35:38 -05:00
			`Create a TPU v5e with 4 TPU chips:`

[Docs] Fix syntax highlighting of shell commands (#19870) Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com> 2025-06-23 18:59:09 +01:00			```bash
[Docs] Convert rST to MyST (Markdown) (#11145) Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com> 2024-12-23 17:35:38 -05:00			`gcloud alpha compute tpus queued-resources create QUEUED_RESOURCE_ID \`
[doc] fix "Other AI accelerators" getting started page (#19457) Signed-off-by: David Xia <david@davidxia.com> 2025-06-11 12:11:17 -04:00			`--node-id TPU_NAME \`
			`--project PROJECT_ID \`
			`--zone ZONE \`
			`--accelerator-type ACCELERATOR_TYPE \`
			`--runtime-version RUNTIME_VERSION \`
			`--service-account SERVICE_ACCOUNT`
[Docs] Convert rST to MyST (Markdown) (#11145) Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com> 2024-12-23 17:35:38 -05:00			```

Migrate docs from Sphinx to MkDocs (#18145) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> 2025-05-23 11:09:53 +02:00			`\| Parameter name \| Description \|`
			`\|--------------------\|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------\|`
			`\| QUEUED_RESOURCE_ID \| The user-assigned ID of the queued resource request. \|`
[doc] fix "Other AI accelerators" getting started page (#19457) Signed-off-by: David Xia <david@davidxia.com> 2025-06-11 12:11:17 -04:00			`\| TPU_NAME \| The user-assigned name of the TPU which is created when the queued resource request is allocated. \|`
Migrate docs from Sphinx to MkDocs (#18145) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> 2025-05-23 11:09:53 +02:00			`\| PROJECT_ID \| Your Google Cloud project \|`
[doc] fix "Other AI accelerators" getting started page (#19457) Signed-off-by: David Xia <david@davidxia.com> 2025-06-11 12:11:17 -04:00			`\| ZONE \| The GCP zone where you want to create your Cloud TPU. The value you use depends on the version of TPUs you are using. For more information, see [TPU regions and zones] \|`
			\| ACCELERATOR_TYPE \| The TPU version you want to use. Specify the TPU version, for example `v5litepod-4` specifies a v5e TPU with 4 cores, `v6e-1` specifies a v6e TPU with 1 core. For more information, see [TPU versions]. \|
[Docs][TPU] Highlight TPU Software version selection (#22242) Signed-off-by: NickLucche <nlucches@redhat.com> 2025-08-05 11:33:46 +02:00			\| RUNTIME_VERSION \| The TPU VM runtime version to use. For example, use `v2-alpha-tpuv6e` for a VM loaded with one or more v6e TPU(s). \|
[doc] fix "Other AI accelerators" getting started page (#19457) Signed-off-by: David Xia <david@davidxia.com> 2025-06-11 12:11:17 -04:00			\| SERVICE_ACCOUNT \| The email address for your service account. You can find it in the IAM Cloud Console under Service Accounts. For example: `tpu-service-account@<your_project_ID>.iam.gserviceaccount.com` \|
[Docs] Convert rST to MyST (Markdown) (#11145) Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com> 2024-12-23 17:35:38 -05:00
[doc] add project flag to gcloud TPU command (#19664) Signed-off-by: David Xia <david@davidxia.com> 2025-06-16 21:00:09 -04:00			`Connect to your TPU VM using SSH:`
[Docs] Convert rST to MyST (Markdown) (#11145) Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com> 2024-12-23 17:35:38 -05:00
			```bash
[doc] add project flag to gcloud TPU command (#19664) Signed-off-by: David Xia <david@davidxia.com> 2025-06-16 21:00:09 -04:00			`gcloud compute tpus tpu-vm ssh TPU_NAME --project PROJECT_ID --zone ZONE`
[Docs] Convert rST to MyST (Markdown) (#11145) Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com> 2024-12-23 17:35:38 -05:00			```

[Docs][TPU] Highlight TPU Software version selection (#22242) Signed-off-by: NickLucche <nlucches@redhat.com> 2025-08-05 11:33:46 +02:00			`!!! note`
			When configuring `RUNTIME_VERSION` ("TPU software version") on GCP, ensure it matches the TPU generation you've selected by referencing the [TPU VM images] compatibility matrix. Using an incompatible version may prevent vLLM from running correctly.

[doc] fix "Other AI accelerators" getting started page (#19457) Signed-off-by: David Xia <david@davidxia.com> 2025-06-11 12:11:17 -04:00			`[TPU versions]: https://cloud.google.com/tpu/docs/runtimes`
			`[TPU VM images]: https://cloud.google.com/tpu/docs/runtimes`
			`[TPU regions and zones]: https://cloud.google.com/tpu/docs/regions-zones`

[doc] split "Other AI Accelerators" tabs (#19708) 2025-06-17 09:05:29 -04:00			`## Set up using Python`
[Doc] Organise installation documentation into categories and tabs (#11935) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> 2025-01-13 12:27:36 +00:00
[doc] split "Other AI Accelerators" tabs (#19708) 2025-06-17 09:05:29 -04:00			`### Pre-built wheels`
[Doc] Organise installation documentation into categories and tabs (#11935) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> 2025-01-13 12:27:36 +00:00
			`Currently, there are no pre-built TPU wheels.`

[doc] split "Other AI Accelerators" tabs (#19708) 2025-06-17 09:05:29 -04:00			`### Build wheel from source`
[Doc] Organise installation documentation into categories and tabs (#11935) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> 2025-01-13 12:27:36 +00:00
[Doc] Minor documentation fixes (#11580) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> 2024-12-28 21:53:59 +08:00			`Install Miniconda:`
[Docs] Convert rST to MyST (Markdown) (#11145) Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com> 2024-12-23 17:35:38 -05:00
			```bash
			`wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh`
			`bash Miniconda3-latest-Linux-x86_64.sh`
			`source ~/.bashrc`
			```

			`Create and activate a Conda environment for vLLM:`

			```bash
[TPU] Start using python 3.12 (#21000) Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com> 2025-07-16 19:37:44 -07:00			`conda create -n vllm python=3.12 -y`
[Docs] Convert rST to MyST (Markdown) (#11145) Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com> 2024-12-23 17:35:38 -05:00			`conda activate vllm`
			```

			`Clone the vLLM repository and go to the vLLM directory:`

			```bash
			`git clone https://github.com/vllm-project/vllm.git && cd vllm`
			```

			Uninstall the existing `torch` and `torch_xla` packages:

			```bash
			`pip uninstall torch torch-xla -y`
			```

			`Install build dependencies:`

			```bash
Move requirements into their own directory (#12547) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> 2025-03-08 17:44:35 +01:00			`pip install -r requirements/tpu.txt`
[doc] split "Other AI Accelerators" tabs (#19708) 2025-06-17 09:05:29 -04:00			`sudo apt-get install --no-install-recommends --yes libopenblas-base libopenmpi-dev libomp-dev`
[Docs] Convert rST to MyST (Markdown) (#11145) Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com> 2024-12-23 17:35:38 -05:00			```

			`Run the setup script:`

			```bash
[TPU][V1][CI] Replace `python3 setup.py develop` with standard `pip install --e` on TPU (#17374) Signed-off-by: NickLucche <nlucches@redhat.com> 2025-04-29 19:36:48 +02:00			`VLLM_TARGET_DEVICE="tpu" python -m pip install -e .`
[Docs] Convert rST to MyST (Markdown) (#11145) Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com> 2024-12-23 17:35:38 -05:00			```

[doc] split "Other AI Accelerators" tabs (#19708) 2025-06-17 09:05:29 -04:00			`## Set up using Docker`
[Docs] Convert rST to MyST (Markdown) (#11145) Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com> 2024-12-23 17:35:38 -05:00
[doc] split "Other AI Accelerators" tabs (#19708) 2025-06-17 09:05:29 -04:00			`### Pre-built images`
[Docs] Convert rST to MyST (Markdown) (#11145) Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com> 2024-12-23 17:35:38 -05:00
Migrate docs from Sphinx to MkDocs (#18145) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> 2025-05-23 11:09:53 +02:00			See [deployment-docker-pre-built-image][deployment-docker-pre-built-image] for instructions on using the official Docker image, making sure to substitute the image name `vllm/vllm-openai` with `vllm/vllm-tpu`.
[Docs] Convert rST to MyST (Markdown) (#11145) Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com> 2024-12-23 17:35:38 -05:00
[doc] split "Other AI Accelerators" tabs (#19708) 2025-06-17 09:05:29 -04:00			`### Build image from source`
[Docs] Convert rST to MyST (Markdown) (#11145) Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com> 2024-12-23 17:35:38 -05:00
Move dockerfiles into their own directory (#14549) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> 2025-03-31 21:47:32 +01:00			`You can use <gh-file:docker/Dockerfile.tpu> to build a Docker image with TPU support.`
[Docs] Convert rST to MyST (Markdown) (#11145) Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com> 2024-12-23 17:35:38 -05:00
[Docs] Fix syntax highlighting of shell commands (#19870) Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com> 2025-06-23 18:59:09 +01:00			```bash
Move dockerfiles into their own directory (#14549) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> 2025-03-31 21:47:32 +01:00			`docker build -f docker/Dockerfile.tpu -t vllm-tpu .`
[Docs] Convert rST to MyST (Markdown) (#11145) Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com> 2024-12-23 17:35:38 -05:00			```

			`Run the Docker image with the following command:`

[Docs] Fix syntax highlighting of shell commands (#19870) Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com> 2025-06-23 18:59:09 +01:00			```bash
[CI/Build] Add markdown linter (#11857) Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com> 2025-01-12 03:17:13 -05:00			# Make sure to add `--privileged --net host --shm-size=16G`.
			`docker run --privileged --net host --shm-size=16G -it vllm-tpu`
[Docs] Convert rST to MyST (Markdown) (#11145) Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com> 2024-12-23 17:35:38 -05:00			```

Migrate docs from Sphinx to MkDocs (#18145) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> 2025-05-23 11:09:53 +02:00			`!!! note`
			`Since TPU relies on XLA which requires static shapes, vLLM bucketizes the`
			`possible input shapes and compiles an XLA graph for each shape. The`
			`compilation time may take 20~30 minutes in the first run. However, the`
			`compilation time reduces to ~5 minutes afterwards because the XLA graphs are`
			cached in the disk (in `VLLM_XLA_CACHE_PATH` or `~/.cache/vllm/xla_cache` by default).
[Docs] Convert rST to MyST (Markdown) (#11145) Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com> 2024-12-23 17:35:38 -05:00
Migrate docs from Sphinx to MkDocs (#18145) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> 2025-05-23 11:09:53 +02:00			`!!! tip`
			`If you encounter the following error:`
[Docs] Convert rST to MyST (Markdown) (#11145) Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com> 2024-12-23 17:35:38 -05:00
Migrate docs from Sphinx to MkDocs (#18145) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> 2025-05-23 11:09:53 +02:00			```console
			`from torch._C import * # noqa: F403`
			`ImportError: libopenblas.so.0: cannot open shared object file: No such`
			`file or directory`
			```
[Docs] Convert rST to MyST (Markdown) (#11145) Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com> 2024-12-23 17:35:38 -05:00
Migrate docs from Sphinx to MkDocs (#18145) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> 2025-05-23 11:09:53 +02:00			`Install OpenBLAS with the following command:`
[Doc] Convert docs to use colon fences (#12471) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> 2025-01-29 03:38:29 +00:00
[Docs] Fix syntax highlighting of shell commands (#19870) Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com> 2025-06-23 18:59:09 +01:00			```bash
[doc] split "Other AI Accelerators" tabs (#19708) 2025-06-17 09:05:29 -04:00			`sudo apt-get install --no-install-recommends --yes libopenblas-base libopenmpi-dev libomp-dev`
Migrate docs from Sphinx to MkDocs (#18145) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> 2025-05-23 11:09:53 +02:00			```