Alex Brooks
|
069d3bd8d0
|
[Frontend] Add Early Validation For Chat Template / Tool Call Parser (#9151)
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
|
2024-10-08 14:31:26 +00:00 |
|
Brendan Wong
|
8c746226c9
|
[Frontend] API support for beam search for MQLLMEngine (#9117)
|
2024-10-08 05:51:43 +00:00 |
|
Brendan Wong
|
168cab6bbf
|
[Frontend] API support for beam search (#9087)
Co-authored-by: youkaichao <youkaichao@126.com>
|
2024-10-05 23:39:03 -07:00 |
|
Flávia Béo
|
0dcc8cbe5a
|
Adds truncate_prompt_tokens param for embeddings creation (#8999)
Signed-off-by: Flavia Beo <flavia.beo@ibm.com>
|
2024-10-04 18:31:40 +00:00 |
|
Roger Wang
|
26aa325f4f
|
[Core][VLM] Test registration for OOT multimodal models (#8717)
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-10-04 10:38:25 -07:00 |
|
Joe Runde
|
062c89e7c9
|
[Frontend][Core] Move guided decoding params into sampling params (#8252)
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
|
2024-10-01 09:34:25 +08:00 |
|
danieljannai21
|
6c9ba48fde
|
[Frontend] Added support for HF's new continue_final_message parameter (#8942)
|
2024-09-29 17:59:47 +00:00 |
|
Nick Hill
|
4b377d6feb
|
[BugFix] Fix test breakages from transformers 4.45 upgrade (#8829)
|
2024-09-26 16:46:43 -07:00 |
|
Alexander Matveev
|
1a2aef3e59
|
Add output streaming support to multi-step + async while ensuring RequestOutput obj reuse (#8335)
|
2024-09-23 15:38:04 -07:00 |
|
Jiaxin Shan
|
260d40b5ea
|
[Core] Support Lora lineage and base model metadata management (#6315)
|
2024-09-20 06:20:56 +00:00 |
|
Alexander Matveev
|
7c7714d856
|
[Core][Bugfix][Perf] Introduce MQLLMEngine to avoid asyncio OH (#8157)
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
Co-authored-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-neuralmagic@users.noreply.github.com>
Co-authored-by: Simon Mo <simon.mo@hey.com>
|
2024-09-18 13:56:58 +00:00 |
|
Pooya Davoodi
|
cea95dfb94
|
[Frontend] Create ErrorResponse instead of raising exceptions in run_batch (#8347)
|
2024-09-11 05:30:11 +00:00 |
|
Jiaxin Shan
|
db3bf7c991
|
[Core] Support load and unload LoRA in api server (#6566)
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
|
2024-09-05 18:10:33 -07:00 |
|
Roger Wang
|
5231f0898e
|
[Frontend][VLM] Add support for multiple multi-modal items (#8049)
|
2024-08-31 16:35:53 -07:00 |
|
Nick Hill
|
39178c7fbc
|
[Tests] Disable retries and use context manager for openai client (#7565)
|
2024-08-26 21:33:17 -07:00 |
|
ℍ𝕠𝕝𝕝𝕠𝕨 𝕄𝕒𝕟
|
0b769992ec
|
[Bugfix]: Use float32 for base64 embedding (#7855)
Signed-off-by: Hollow Man <hollowman@opensuse.org>
|
2024-08-26 03:16:38 +00:00 |
|
Tyler Rockwood
|
d81abefd2e
|
[Frontend] add json_schema support from OpenAI protocol (#7654)
|
2024-08-23 23:07:24 -07:00 |
|
Pooya Davoodi
|
8da48e4d95
|
[Frontend] Publish Prometheus metrics in run_batch API (#7641)
|
2024-08-23 23:04:22 -07:00 |
|
Maximilien de Bayser
|
e25fee57c2
|
[BugFix] Fix server crash on empty prompt (#7746)
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
|
2024-08-23 13:12:44 +00:00 |
|
Joe Runde
|
b903e1ba7f
|
[Frontend] error suppression cleanup (#7786)
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
|
2024-08-22 21:50:21 +00:00 |
|
Joe Runde
|
cde9183b40
|
[Bug][Frontend] Improve ZMQ client robustness (#7443)
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
|
2024-08-22 02:18:11 +00:00 |
|
Peter Salas
|
1ca0d4f86b
|
[Model] Add UltravoxModel and UltravoxConfig (#7615)
|
2024-08-21 22:49:39 +00:00 |
|
Robert Shaw
|
970dfdc01d
|
[Frontend] Improve Startup Failure UX (#7716)
|
2024-08-21 19:53:01 +00:00 |
|
Robert Shaw
|
f7e3b0c5aa
|
[Bugfix][Frontend] Fix Issues Under High Load With zeromq Frontend (#7394)
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
|
2024-08-21 13:34:14 -04:00 |
|
Cyrus Leung
|
baaedfdb2d
|
[mypy] Enable following imports for entrypoints (#7248)
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Co-authored-by: Fei <dfdfcai4@gmail.com>
|
2024-08-20 23:28:21 -07:00 |
|
Robert Shaw
|
e3b318216d
|
[ Bugfix ] Fix Prometheus Metrics With zeromq Frontend (#7279)
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
|
2024-08-18 20:19:48 +00:00 |
|
Roger Wang
|
bbf55c4805
|
[VLM] Refactor MultiModalConfig initialization and profiling (#7530)
|
2024-08-17 13:30:55 -07:00 |
|
Grant Pinkert
|
f878c8feb0
|
[Feature]: Add OpenAI server prompt_logprobs support #6508 (#7453)
|
2024-08-16 02:38:08 +00:00 |
|
youkaichao
|
16422ea76f
|
[misc][plugin] add plugin system implementation (#7426)
|
2024-08-13 16:24:17 -07:00 |
|
youkaichao
|
33e5d7e6b6
|
[frontend] spawn engine process from api server process (#7484)
|
2024-08-13 15:40:17 -07:00 |
|
Peter Salas
|
00c3d68e45
|
[Frontend][Core] Add plumbing to support audio language models (#7446)
|
2024-08-13 17:39:33 +00:00 |
|
Cyrus Leung
|
7025b11d94
|
[Bugfix] Fix weight loading for Chameleon when TP>1 (#7410)
|
2024-08-13 05:33:41 +00:00 |
|
Andrew Wang
|
97a6be95ba
|
[Misc] improve logits processors logging message (#7435)
|
2024-08-13 02:29:34 +00:00 |
|
Pooya Davoodi
|
249b88228d
|
[Frontend] Support embeddings in the run_batch API (#7132)
Co-authored-by: Simon Mo <simon.mo@hey.com>
|
2024-08-09 09:48:21 -07:00 |
|
Cyrus Leung
|
7eb4a51c5f
|
[Core] Support serving encoder/decoder models (#7258)
|
2024-08-09 10:39:41 +08:00 |
|
Joe Runde
|
21b9c49aa3
|
[Frontend] Kill the server on engine death (#6594)
Signed-off-by: Joe Runde <joe@joerun.de>
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
|
2024-08-08 09:47:48 -07:00 |
|
Maximilien de Bayser
|
fde47d3bc2
|
[BugFix] Fix frontend multiprocessing hang (#7217)
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-neuralmagic@users.noreply.github.com>
|
2024-08-07 18:09:36 +00:00 |
|
Cyrus Leung
|
66d617e343
|
[Frontend] Gracefully handle missing chat template and fix CI failure (#7238)
Co-authored-by: Roger Wang <ywang@roblox.com>
|
2024-08-07 09:12:05 +00:00 |
|
youkaichao
|
dfb1a15dcb
|
[ci][frontend] deduplicate tests (#7101)
|
2024-08-05 15:59:22 -07:00 |
|
Yihuan Bu
|
654bc5ca49
|
Support for guided decoding for offline LLM (#6878)
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2024-08-04 03:12:09 +00:00 |
|
Robert Shaw
|
ed812a73fa
|
[ Frontend ] Multiprocessing for OpenAI Server with zeromq (#6883)
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
Co-authored-by: Joe Runde <Joseph.Runde@ibm.com>
Co-authored-by: Joe Runde <joe@joerun.de>
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
Co-authored-by: Simon Mo <simon.mo@hey.com>
|
2024-08-02 18:27:28 -07:00 |
|
youkaichao
|
806949514a
|
[ci] set timeout for test_oot_registration.py (#7082)
|
2024-08-02 10:03:24 -07:00 |
|
zifeitong
|
3c10591ef2
|
[Bugfix] Set SamplingParams.max_tokens for OpenAI requests if not provided by user (#6954)
|
2024-07-31 21:13:34 -07:00 |
|
Nick Hill
|
9f69d8245a
|
[Frontend] New allowed_token_ids decoding request parameter (#6753)
|
2024-07-29 23:37:27 +00:00 |
|
Chang Su
|
316a41ac1d
|
[Bugfix] Fix encoding_format in examples/openai_embedding_client.py (#6755)
|
2024-07-24 22:48:07 -07:00 |
|
Evan Z. Liu
|
5689e256ba
|
[Frontend] Represent tokens with identifiable strings (#6626)
|
2024-07-25 09:51:00 +08:00 |
|
Yehoshua Cohen
|
58f53034ad
|
[Frontend] Add Usage data in each chunk for chat_serving. #6540 (#6652)
|
2024-07-23 11:41:55 -07:00 |
|
Cyrus Leung
|
97234be0ec
|
[Misc] Manage HTTP connections in one place (#6600)
|
2024-07-22 21:32:02 -07:00 |
|
Cyrus Leung
|
739b61a348
|
[Frontend] Refactor prompt processing (#4028)
Co-authored-by: Roger Wang <ywang@roblox.com>
|
2024-07-22 10:13:53 -07:00 |
|
Cyrus Leung
|
6366efc67b
|
[Bugfix][Frontend] Fix missing /metrics endpoint (#6463)
|
2024-07-19 03:55:13 +00:00 |
|