Cyrus Leung
981b0d5673
[Frontend] Factor out code for running uvicorn ( #6828 )
2024-07-27 09:58:25 +08:00
Evan Z. Liu
5689e256ba
[Frontend] Represent tokens with identifiable strings ( #6626 )
2024-07-25 09:51:00 +08:00
Daniele
ee812580f7
[Frontend] split run_server into build_server and run_server ( #6740 )
2024-07-24 10:36:04 -07:00
Cyrus Leung
739b61a348
[Frontend] Refactor prompt processing ( #4028 )
...
Co-authored-by: Roger Wang <ywang@roblox.com >
2024-07-22 10:13:53 -07:00
Daniele
51f8aa90ad
[Bugfix][Frontend] remove duplicate init logger ( #6581 )
2024-07-19 10:16:27 -07:00
Cyrus Leung
6366efc67b
[Bugfix][Frontend] Fix missing /metrics endpoint ( #6463 )
2024-07-19 03:55:13 +00:00
Nick Hill
e2fbaee725
[BugFix][Frontend] Use LoRA tokenizer in OpenAI APIs ( #6227 )
...
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
2024-07-18 15:13:30 +08:00
sasha0552
7a3d2a5b95
[Frontend] Support for chat completions input in the tokenize endpoint ( #5923 )
2024-07-16 20:18:09 +08:00
Ethan Xu
dbfe254eda
[Feature] vLLM CLI ( #5090 )
...
Co-authored-by: simon-mo <simon.mo@hey.com >
2024-07-14 15:36:43 -07:00
Swapnil Parekh
4d6ada947c
[CORE] Adding support for insertion of soft-tuned prompts ( #4645 )
...
Co-authored-by: Swapnil Parekh <swapnilp@ibm.com >
Co-authored-by: Joe G <joseph.granados@h2o.ai >
Co-authored-by: Antoni Baum <antoni.baum@protonmail.com >
2024-07-09 13:26:36 -07:00
youkaichao
3b08fe2b13
[misc][frontend] log all available endpoints ( #6195 )
...
Co-authored-by: Cody Yu <hao.yu.cody@gmail.com >
2024-07-07 15:11:12 -07:00
xwjiang2010
98d6682cd1
[VLM] Remove image_input_type from VLM config ( #5852 )
...
Signed-off-by: Xiaowei Jiang <xwjiang2010@gmail.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
Co-authored-by: Roger Wang <ywang@roblox.com >
2024-07-02 07:57:09 +00:00
sasha0552
c54269d967
[Frontend] Add tokenize/detokenize endpoints ( #5054 )
2024-06-26 16:54:22 +00:00
Cyrus Leung
03dccc886e
[Misc] Add vLLM version getter to utils ( #5098 )
2024-06-13 11:21:39 -07:00
Roger Wang
68bc81703e
[Frontend][Misc] Enforce Pixel Values as Input Type for VLMs in API Server ( #5374 )
2024-06-10 09:13:39 +00:00
Nadav Shmayovits
37464a0f74
[Bugfix] Fix call to init_logger in openai server ( #4765 )
2024-06-01 17:18:50 +00:00
Pierre Dulac
9216b9cc38
[Bugfix] Bypass authorization API token for preflight requests ( #4862 )
2024-05-16 09:42:21 -07:00
Chang Su
e254497b66
[Model][Misc] Add e5-mistral-7b-instruct and Embedding API ( #3734 )
2024-05-11 11:30:37 -07:00
Cyrus Leung
f12b20decc
[Frontend] Move async logic outside of constructor ( #4674 )
2024-05-08 22:48:33 -07:00
Cyrus Leung
323f27b904
[Bugfix] Fix asyncio.Task not being subscriptable ( #4623 )
2024-05-06 09:31:05 -07:00
Yang, Bo
808632d3b4
[BugFix] Prevent the task of _force_log from being garbage collected ( #4567 )
2024-05-03 01:35:18 +00:00
youkaichao
5b8a7c1cb0
[Misc] centralize all usage of environment variables ( #4548 )
2024-05-02 11:13:25 -07:00
Robert Shaw
4dc8026d86
[Bugfix] Fix 307 Redirect for /metrics ( #4523 )
2024-05-01 09:14:13 -07:00
SangBin Cho
a88081bf76
[CI] Disable non-lazy string operation on logging ( #4326 )
...
Co-authored-by: Danny Guinther <dguinther@neuralmagic.com >
2024-04-26 00:16:58 -07:00
SangBin Cho
0ae11f78ab
[Mypy] Part 3 fix typing for nested directories for most of directory ( #4161 )
2024-04-22 21:32:44 -07:00
Harry Mellor
66ded03067
Allow model to be served under multiple names ( #2894 )
...
Co-authored-by: Alexandre Payot <alexandrep@graphcore.ai >
2024-04-18 00:16:26 -07:00
A-Mahla
0739b1947f
[Frontend][Bugfix] allow using the default middleware with a root path ( #3788 )
...
Co-authored-by: A-Mahla <>
2024-04-02 01:20:28 -07:00
yhu422
d8658c8cc1
Usage Stats Collection ( #2852 )
2024-03-28 22:16:12 -07:00
SangBin Cho
01bfb22b41
[CI] Try introducing isort. ( #3495 )
2024-03-25 07:59:47 -07:00
Simon Mo
ef65dcfa6f
[Doc] Add docs about OpenAI compatible server ( #3288 )
2024-03-18 22:05:34 -07:00
Dan Clark
03d37f2441
[Fix] Add args for mTLS support ( #3430 )
...
Co-authored-by: declark1 <daniel.clark@ibm.com >
2024-03-15 09:56:13 -07:00
Zhuohan Li
2f8844ba08
Re-enable the 80 char line width limit ( #3305 )
2024-03-10 19:49:14 -07:00
Nick Hill
d2339d6840
Connect engine healthcheck to openai server ( #3260 )
2024-03-07 16:38:12 -08:00
Jason Cox
d65fac2738
Add vLLM version info to logs and openai API server ( #3161 )
2024-03-02 21:00:29 -08:00
Allen.Dou
29e70e3e88
allow user chose log level by --log-level instead of fixed 'info'. ( #3109 )
...
Co-authored-by: zixiao <shunli.dsl@alibaba-inc.com >
Co-authored-by: Simon Mo <simon.mo@hey.com >
2024-03-01 23:28:41 +00:00
Harry Mellor
ef978fe411
Port metrics from aioprometheus to prometheus_client ( #2730 )
2024-02-25 11:54:00 -08:00
jvmncs
8f36444c4f
multi-LoRA as extra models in OpenAI server ( #2775 )
...
how to serve the loras (mimicking the [multilora inference example](https://github.com/vllm-project/vllm/blob/main/examples/multilora_inference.py )):
```terminal
$ export LORA_PATH=~/.cache/huggingface/hub/models--yard1--llama-2-7b-sql-lora-test/
$ python -m vllm.entrypoints.api_server \
--model meta-llama/Llama-2-7b-hf \
--enable-lora \
--lora-modules sql-lora=$LORA_PATH sql-lora2=$LORA_PATH
```
the above server will list 3 separate values if the user queries `/models`: one for the base served model, and one each for the specified lora modules. in this case sql-lora and sql-lora2 point to the same underlying lora, but this need not be the case. lora config values take the same values they do in EngineArgs
no work has been done here to scope client permissions to specific models
2024-02-17 12:00:48 -08:00
Erfan Al-Hossami
9c1352eb57
[Feature] Simple API token authentication and pluggable middlewares ( #1106 )
2024-01-23 15:13:00 -08:00
Jannis Schönleber
71d63ed72e
migrate pydantic from v1 to v2 ( #2531 )
2024-01-21 16:05:56 -08:00
FlorianJoncour
14cc317ba4
OpenAI Server refactoring ( #2360 )
2024-01-16 21:33:14 -08:00
Chirag Jain
ce036244c9
Allow setting fastapi root_path argument ( #2341 )
2024-01-12 10:59:59 -08:00
Iskren Ivov Chernev
d0215a58e7
Ensure metrics are logged regardless of requests ( #2347 )
2024-01-05 05:24:42 -08:00
Harry Mellor
08133c4d1a
Add SSL arguments to API servers ( #2109 )
2023-12-18 10:56:23 +08:00
Simon Mo
2e8fc0d4c3
Fix completion API echo and logprob combo ( #1992 )
2023-12-10 13:20:30 -08:00
Jin Shang
1aa1361510
Fix OpenAI server completion_tokens referenced before assignment ( #1996 )
2023-12-09 21:01:21 -08:00
Roy
60dc62dc9e
add custom server params ( #1868 )
2023-12-03 12:59:18 -08:00
Simon Mo
5313c2cb8b
Add Production Metrics in Prometheus format ( #1890 )
2023-12-02 16:37:44 -08:00
Adam Brusselback
66785cc05c
Support chat template and echo for chat API ( #1756 )
2023-11-30 16:43:13 -08:00
Michael McCulloch
c782195662
Disable Logs Requests should Disable Logging of requests. ( #1779 )
...
Co-authored-by: Michael McCulloch <mjm.gitlab@fastmail.com >
2023-11-29 21:50:02 -08:00
Yunmo Chen
665cbcec4b
Added echo function to OpenAI API server. ( #1504 )
2023-11-26 21:29:17 -08:00