[CI/Build] Replace vllm.entrypoints.openai.api_server entrypoint with vllm serve command (#25967)

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
This commit is contained in:
Cyrus Leung
2025-10-03 01:04:57 +08:00
committed by GitHub
parent 3b279a84be
commit d00d652998
22 changed files with 101 additions and 66 deletions

View File

@@ -67,8 +67,7 @@ docker run -it \
XPU platform supports **tensor parallel** inference/serving and also supports **pipeline parallel** as a beta feature for online serving. For **pipeline parallel**, we support it on single node with mp as the backend. For example, a reference execution like following:
```bash
python -m vllm.entrypoints.openai.api_server \
--model=facebook/opt-13b \
vllm serve facebook/opt-13b \
--dtype=bfloat16 \
--max_model_len=1024 \
--distributed-executor-backend=mp \