Nick Hill
|
acd5511b6d
|
[BugFix] Fix clean shutdown issues (#8492)
|
2024-09-16 09:33:46 -07:00 |
|
lewtun
|
837c1968f9
|
[Frontend] Expose revision arg in OpenAI server (#8501)
|
2024-09-16 15:55:26 +00:00 |
|
Jiaxin Shan
|
db3bf7c991
|
[Core] Support load and unload LoRA in api server (#6566)
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
|
2024-09-05 18:10:33 -07:00 |
|
Kyle Mistele
|
e02ce498be
|
[Feature] OpenAI-Compatible Tools API + Streaming for Hermes & Mistral models (#5649)
Co-authored-by: constellate <constellate@1-ai-appserver-staging.codereach.com>
Co-authored-by: Kyle Mistele <kyle@constellate.ai>
|
2024-09-04 13:18:13 -07:00 |
|
Nick Hill
|
d4db9f53c8
|
[Benchmark] Add --async-engine option to benchmark_throughput.py (#7964)
|
2024-09-03 20:57:41 -04:00 |
|
Joe Runde
|
b903e1ba7f
|
[Frontend] error suppression cleanup (#7786)
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
|
2024-08-22 21:50:21 +00:00 |
|
Joe Runde
|
cde9183b40
|
[Bug][Frontend] Improve ZMQ client robustness (#7443)
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
|
2024-08-22 02:18:11 +00:00 |
|
William Lin
|
dd53c4b023
|
[misc] Add Torch profiler support (#7451)
Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>
|
2024-08-21 15:39:26 -07:00 |
|
Robert Shaw
|
970dfdc01d
|
[Frontend] Improve Startup Failure UX (#7716)
|
2024-08-21 19:53:01 +00:00 |
|
Robert Shaw
|
f7e3b0c5aa
|
[Bugfix][Frontend] Fix Issues Under High Load With zeromq Frontend (#7394)
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
|
2024-08-21 13:34:14 -04:00 |
|
Cyrus Leung
|
baaedfdb2d
|
[mypy] Enable following imports for entrypoints (#7248)
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Co-authored-by: Fei <dfdfcai4@gmail.com>
|
2024-08-20 23:28:21 -07:00 |
|
Robert Shaw
|
e3b318216d
|
[ Bugfix ] Fix Prometheus Metrics With zeromq Frontend (#7279)
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
|
2024-08-18 20:19:48 +00:00 |
|
Gordon Wong
|
0e39a33c6d
|
[Bugfix][Hardware][AMD][Frontend] add quantization param to embedding checking method (#7513)
|
2024-08-16 10:05:18 -06:00 |
|
Michael Goin
|
9c8e2d1161
|
[Bugfix][Harmless] Fix float16 dtype for model_is_embedding (#7566)
|
2024-08-15 18:26:19 -07:00 |
|
youkaichao
|
33e5d7e6b6
|
[frontend] spawn engine process from api server process (#7484)
|
2024-08-13 15:40:17 -07:00 |
|
Joe Runde
|
21b9c49aa3
|
[Frontend] Kill the server on engine death (#6594)
Signed-off-by: Joe Runde <joe@joerun.de>
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
|
2024-08-08 09:47:48 -07:00 |
|
Maximilien de Bayser
|
fde47d3bc2
|
[BugFix] Fix frontend multiprocessing hang (#7217)
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-neuralmagic@users.noreply.github.com>
|
2024-08-07 18:09:36 +00:00 |
|
Robert Shaw
|
564985729a
|
[ BugFix ] Move zmq frontend to IPC instead of TCP (#7222)
|
2024-08-07 16:24:56 +00:00 |
|
Aditya Paliwal
|
57f560aa23
|
[BugFix] Use args.trust_remote_code (#7121)
|
2024-08-05 09:26:14 -07:00 |
|
Cyrus Leung
|
cc08fc7225
|
[Frontend] Reapply "Factor out code for running uvicorn" (#7095)
|
2024-08-04 20:40:51 -07:00 |
|
Robert Shaw
|
ed812a73fa
|
[ Frontend ] Multiprocessing for OpenAI Server with zeromq (#6883)
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
Co-authored-by: Joe Runde <Joseph.Runde@ibm.com>
Co-authored-by: Joe Runde <joe@joerun.de>
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
Co-authored-by: Simon Mo <simon.mo@hey.com>
|
2024-08-02 18:27:28 -07:00 |
|
Simon Mo
|
7eb0cb4a14
|
Revert "[Frontend] Factor out code for running uvicorn" (#7012)
Co-authored-by: Robert Shaw <114415538+robertgshaw2-neuralmagic@users.noreply.github.com>
|
2024-07-31 16:34:26 -07:00 |
|
Cyrus Leung
|
981b0d5673
|
[Frontend] Factor out code for running uvicorn (#6828)
|
2024-07-27 09:58:25 +08:00 |
|
Evan Z. Liu
|
5689e256ba
|
[Frontend] Represent tokens with identifiable strings (#6626)
|
2024-07-25 09:51:00 +08:00 |
|
Daniele
|
ee812580f7
|
[Frontend] split run_server into build_server and run_server (#6740)
|
2024-07-24 10:36:04 -07:00 |
|
Cyrus Leung
|
739b61a348
|
[Frontend] Refactor prompt processing (#4028)
Co-authored-by: Roger Wang <ywang@roblox.com>
|
2024-07-22 10:13:53 -07:00 |
|
Daniele
|
51f8aa90ad
|
[Bugfix][Frontend] remove duplicate init logger (#6581)
|
2024-07-19 10:16:27 -07:00 |
|
Cyrus Leung
|
6366efc67b
|
[Bugfix][Frontend] Fix missing /metrics endpoint (#6463)
|
2024-07-19 03:55:13 +00:00 |
|
Nick Hill
|
e2fbaee725
|
[BugFix][Frontend] Use LoRA tokenizer in OpenAI APIs (#6227)
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2024-07-18 15:13:30 +08:00 |
|
sasha0552
|
7a3d2a5b95
|
[Frontend] Support for chat completions input in the tokenize endpoint (#5923)
|
2024-07-16 20:18:09 +08:00 |
|
Ethan Xu
|
dbfe254eda
|
[Feature] vLLM CLI (#5090)
Co-authored-by: simon-mo <simon.mo@hey.com>
|
2024-07-14 15:36:43 -07:00 |
|
Swapnil Parekh
|
4d6ada947c
|
[CORE] Adding support for insertion of soft-tuned prompts (#4645)
Co-authored-by: Swapnil Parekh <swapnilp@ibm.com>
Co-authored-by: Joe G <joseph.granados@h2o.ai>
Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>
|
2024-07-09 13:26:36 -07:00 |
|
youkaichao
|
3b08fe2b13
|
[misc][frontend] log all available endpoints (#6195)
Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>
|
2024-07-07 15:11:12 -07:00 |
|
xwjiang2010
|
98d6682cd1
|
[VLM] Remove image_input_type from VLM config (#5852)
Signed-off-by: Xiaowei Jiang <xwjiang2010@gmail.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
|
2024-07-02 07:57:09 +00:00 |
|
sasha0552
|
c54269d967
|
[Frontend] Add tokenize/detokenize endpoints (#5054)
|
2024-06-26 16:54:22 +00:00 |
|
Cyrus Leung
|
03dccc886e
|
[Misc] Add vLLM version getter to utils (#5098)
|
2024-06-13 11:21:39 -07:00 |
|
Roger Wang
|
68bc81703e
|
[Frontend][Misc] Enforce Pixel Values as Input Type for VLMs in API Server (#5374)
|
2024-06-10 09:13:39 +00:00 |
|
Nadav Shmayovits
|
37464a0f74
|
[Bugfix] Fix call to init_logger in openai server (#4765)
|
2024-06-01 17:18:50 +00:00 |
|
Pierre Dulac
|
9216b9cc38
|
[Bugfix] Bypass authorization API token for preflight requests (#4862)
|
2024-05-16 09:42:21 -07:00 |
|
Chang Su
|
e254497b66
|
[Model][Misc] Add e5-mistral-7b-instruct and Embedding API (#3734)
|
2024-05-11 11:30:37 -07:00 |
|
Cyrus Leung
|
f12b20decc
|
[Frontend] Move async logic outside of constructor (#4674)
|
2024-05-08 22:48:33 -07:00 |
|
Cyrus Leung
|
323f27b904
|
[Bugfix] Fix asyncio.Task not being subscriptable (#4623)
|
2024-05-06 09:31:05 -07:00 |
|
Yang, Bo
|
808632d3b4
|
[BugFix] Prevent the task of _force_log from being garbage collected (#4567)
|
2024-05-03 01:35:18 +00:00 |
|
youkaichao
|
5b8a7c1cb0
|
[Misc] centralize all usage of environment variables (#4548)
|
2024-05-02 11:13:25 -07:00 |
|
Robert Shaw
|
4dc8026d86
|
[Bugfix] Fix 307 Redirect for /metrics (#4523)
|
2024-05-01 09:14:13 -07:00 |
|
SangBin Cho
|
a88081bf76
|
[CI] Disable non-lazy string operation on logging (#4326)
Co-authored-by: Danny Guinther <dguinther@neuralmagic.com>
|
2024-04-26 00:16:58 -07:00 |
|
SangBin Cho
|
0ae11f78ab
|
[Mypy] Part 3 fix typing for nested directories for most of directory (#4161)
|
2024-04-22 21:32:44 -07:00 |
|
Harry Mellor
|
66ded03067
|
Allow model to be served under multiple names (#2894)
Co-authored-by: Alexandre Payot <alexandrep@graphcore.ai>
|
2024-04-18 00:16:26 -07:00 |
|
A-Mahla
|
0739b1947f
|
[Frontend][Bugfix] allow using the default middleware with a root path (#3788)
Co-authored-by: A-Mahla <>
|
2024-04-02 01:20:28 -07:00 |
|
yhu422
|
d8658c8cc1
|
Usage Stats Collection (#2852)
|
2024-03-28 22:16:12 -07:00 |
|