[Hardware][TPU][V1] Multi-LoRA implementation for the V1 TPU backend (#14238)

Signed-off-by: Akshat Tripathi <akshat@krai.ai> Signed-off-by: Chengji Yao <chengjiyao@google.com> Co-authored-by: Chengji Yao <chengjiyao@google.com>
2025-05-07 21:28:47 +01:00
parent db593aa67f
commit c20ef40fd0
19 changed files with 929 additions and 46 deletions
--- a/.buildkite/scripts/hardware_ci/run-tpu-v1-test.sh
+++ b/.buildkite/scripts/hardware_ci/run-tpu-v1-test.sh
@@ -50,6 +50,9 @@ docker run --privileged --net host --shm-size=16G -it \
    && pytest -s -v /workspace/vllm/tests/v1/entrypoints/llm/test_struct_output_generate.py \
    && echo TEST_12 \
    && pytest -s -v /workspace/vllm/tests/tpu/test_moe_pallas.py" \
+    # Disable the TPU LoRA tests until the feature is activated
+    # && echo TEST_13 \
+    # && pytest -s -v /workspace/vllm/tests/tpu/lora/" \


 # TODO: This test fails because it uses RANDOM_SEED sampling