[Hardware][TPU] Initial support of model parallelism with single worker using SPMD (#18011)

Signed-off-by: Siyuan Liu <lsiyuan@google.com> Co-authored-by: Hossein Sarshar <hossein.sarshar@gmail.com> Co-authored-by: Chengji Yao <chengjiyao@google.com>
2025-06-02 17:06:20 -07:00
parent c57d577e8d
commit 9112b443a0
11 changed files with 605 additions and 72 deletions
--- a/vllm/config.py
+++ b/vllm/config.py
@@ -1901,6 +1901,8 @@ class ParallelConfig:
            if current_platform.is_neuron():
                # neuron uses single process to control multiple devices
                backend = "uni"
+            elif current_platform.is_tpu() and envs.VLLM_XLA_USE_SPMD:
+                backend = "uni"
            elif (current_platform.is_cuda()
                  and cuda_device_count_stateless() < self.world_size):
                if not ray_found: