[V1][TPU] TPU-optimized top-p implementation (avoids scattering). (#15736)

Signed-off-by: Hyesoo Yang <hyeygit@gmail.com>
Co-authored-by: root <root@t1v-n-822696b7-w-0.us-central2-b.c.tpu-prod-env-large-adhoc.internal>
This commit is contained in:
Hyesoo Yang
2025-04-02 17:18:08 -07:00
committed by GitHub
parent 55acf86bf8
commit 1b84eff03a
3 changed files with 174 additions and 15 deletions

View File

@@ -36,7 +36,9 @@ docker run --privileged --net host --shm-size=16G -it \
&& echo TEST_6 \
&& pytest -s -v /workspace/vllm/tests/v1/tpu/worker/test_tpu_model_runner.py \
&& echo TEST_7 \
&& pytest -s -v /workspace/vllm/tests/v1/tpu/test_sampler.py" \
&& pytest -s -v /workspace/vllm/tests/v1/tpu/test_sampler.py \
&& echo TEST_8 \
&& pytest -s -v /workspace/vllm/tests/v1/tpu/test_topk_topp_sampler.py" \
# TODO: This test fails because it uses RANDOM_SEED sampling