[CI/Build] Replace vllm.entrypoints.openai.api_server entrypoint with vllm serve command (#25967)

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
This commit is contained in:
Cyrus Leung
2025-10-03 01:04:57 +08:00
committed by GitHub
parent 3b279a84be
commit d00d652998
22 changed files with 101 additions and 66 deletions

View File

@@ -48,10 +48,9 @@ The following code configures vLLM in an offline mode to use speculative decodin
To perform the same with an online mode launch the server:
```bash
python -m vllm.entrypoints.openai.api_server \
vllm serve facebook/opt-6.7b \
--host 0.0.0.0 \
--port 8000 \
--model facebook/opt-6.7b \
--seed 42 \
-tp 1 \
--gpu_memory_utilization 0.8 \