Patrick von Platen
|
11cd1ae6ad
|
[Tool parsing] Improve / correct mistral tool parsing (#10333)
|
2024-11-15 00:42:49 +00:00 |
|
Guillaume Calmettes
|
52b48c1ead
|
[BugFix]: properly deserialize tool_calls iterator before processing by mistral-common when MistralTokenizer is used (#9951)
Signed-off-by: Guillaume Calmettes <gcalmettes@scaleway.com>
|
2024-11-14 04:48:16 +00:00 |
|
Mike Depinet
|
f67ce05d0b
|
[Frontend] Pythonic tool parser (#9859)
Signed-off-by: Mike Depinet <mike@fixie.ai>
|
2024-11-14 04:14:34 +00:00 |
|
Cyrus Leung
|
0b8bb86bf1
|
[1/N] Initial prototype for multi-modal processor (#10044)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-11-13 12:39:03 +00:00 |
|
zifeitong
|
47db6ec831
|
[Frontend] Add per-request number of cached token stats (#10174)
|
2024-11-12 16:42:28 +00:00 |
|
Cyrus Leung
|
06386a64dd
|
[Frontend] Chat-based Embeddings API (#9759)
|
2024-11-01 08:13:35 +00:00 |
|
Zhong Qishuai
|
ef7865b4f9
|
[Frontend] re-enable multi-modality input in the new beam search implementation (#9427)
Signed-off-by: Qishuai Ferdinandzhong@gmail.com
|
2024-10-29 11:49:47 +00:00 |
|
Vinay R Damodaran
|
33bab41060
|
[Bugfix]: Make chat content text allow type content (#9358)
Signed-off-by: Vinay Damodaran <vrdn@hey.com>
|
2024-10-24 05:05:49 +00:00 |
|
Yuhong Guo
|
434984e665
|
[Frontend] Support custom request_id from request (#9550)
Co-authored-by: Yuhong Guo <yuhong.gyh@antgroup.com>
|
2024-10-22 18:07:30 +00:00 |
|
Cyrus Leung
|
390be74649
|
[Misc] Print stack trace using logger.exception (#9461)
|
2024-10-17 13:55:48 +00:00 |
|
Chang Su
|
ba30942240
|
[Bugfix] Fix vLLM UsageInfo and logprobs None AssertionError with empty token_ids (#9034)
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
|
2024-10-15 15:40:43 -07:00 |
|
Nick Hill
|
e9d517f276
|
[BugFix] Fix chat API continuous usage stats (#9357)
|
2024-10-14 23:19:48 -07:00 |
|
Brendan Wong
|
4d31cd424b
|
[Frontend] merge beam search implementations (#9296)
|
2024-10-14 15:05:52 -07:00 |
|
Maximilien de Bayser
|
ec10cb8511
|
[BugFix] Fix tool call finish reason in streaming case (#9209)
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
|
2024-10-11 18:24:26 -07:00 |
|
Brendan Wong
|
8c746226c9
|
[Frontend] API support for beam search for MQLLMEngine (#9117)
|
2024-10-08 05:51:43 +00:00 |
|
Yanyi Liu
|
fdf59d30ea
|
[Bugfix] fix tool_parser error handling when serve a model not support it (#8709)
|
2024-10-06 12:51:08 +00:00 |
|
Brendan Wong
|
168cab6bbf
|
[Frontend] API support for beam search (#9087)
Co-authored-by: youkaichao <youkaichao@126.com>
|
2024-10-05 23:39:03 -07:00 |
|
代君
|
3dbb215b38
|
[Frontend][Feature] support tool calling for internlm/internlm2_5-7b-chat model (#8405)
|
2024-10-04 10:36:39 +08:00 |
|
Sebastian Schoennenbeck
|
35bd215168
|
[Core] [Frontend] Priority scheduling for embeddings and in the OpenAI-API (#8965)
|
2024-10-01 09:58:06 +00:00 |
|
Joe Runde
|
062c89e7c9
|
[Frontend][Core] Move guided decoding params into sampling params (#8252)
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
|
2024-10-01 09:34:25 +08:00 |
|
danieljannai21
|
6c9ba48fde
|
[Frontend] Added support for HF's new continue_final_message parameter (#8942)
|
2024-09-29 17:59:47 +00:00 |
|
Maximilien de Bayser
|
344cd2b6f4
|
[Feature] Add support for Llama 3.1 and 3.2 tool use (#8343)
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
|
2024-09-26 17:01:42 -07:00 |
|
Nick Hill
|
4b377d6feb
|
[BugFix] Fix test breakages from transformers 4.45 upgrade (#8829)
|
2024-09-26 16:46:43 -07:00 |
|
Pernekhan Utemuratov
|
93d364da34
|
[Bugfix] Include encoder prompts len to non-stream api usage response (#8861)
|
2024-09-26 15:47:00 -07:00 |
|
Chen Zhang
|
770ec6024f
|
[Model] Add support for the multi-modal Llama 3.2 model (#8811)
Co-authored-by: simon-mo <xmo@berkeley.edu>
Co-authored-by: Chang Su <chang.s.su@oracle.com>
Co-authored-by: Simon Mo <simon.mo@hey.com>
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
|
2024-09-25 13:29:32 -07:00 |
|
Adam Tilghman
|
1ac3de09cd
|
[Frontend] OpenAI server: propagate usage accounting to FastAPI middleware layer (#8672)
|
2024-09-25 07:49:26 +00:00 |
|
Jiaxin Shan
|
260d40b5ea
|
[Core] Support Lora lineage and base model metadata management (#6315)
|
2024-09-20 06:20:56 +00:00 |
|
Alexander Matveev
|
7c7714d856
|
[Core][Bugfix][Perf] Introduce MQLLMEngine to avoid asyncio OH (#8157)
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
Co-authored-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-neuralmagic@users.noreply.github.com>
Co-authored-by: Simon Mo <simon.mo@hey.com>
|
2024-09-18 13:56:58 +00:00 |
|
Patrick von Platen
|
a54ed80249
|
[Model] Add mistral function calling format to all models loaded with "mistral" format (#8515)
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2024-09-17 17:50:37 +00:00 |
|
Nick Hill
|
551ce01078
|
[Core] Add engine option to return only deltas or final output (#7381)
|
2024-09-12 12:02:00 -07:00 |
|
tomeras91
|
5a60699c45
|
[Bugfix]: Fix the logic for deciding if tool parsing is used (#8366)
|
2024-09-12 03:55:30 +00:00 |
|
Cyrus Leung
|
8c054b7a62
|
[Frontend] Clean up type annotations for mistral tokenizer (#8314)
|
2024-09-10 16:49:11 +00:00 |
|
Kyle Mistele
|
08287ef675
|
[Bugfix] Streamed tool calls now more strictly follow OpenAI's format; ensures Vercel AI SDK compatibility (#8272)
|
2024-09-09 10:45:11 -04:00 |
|
Kyle Mistele
|
e02ce498be
|
[Feature] OpenAI-Compatible Tools API + Streaming for Hermes & Mistral models (#5649)
Co-authored-by: constellate <constellate@1-ai-appserver-staging.codereach.com>
Co-authored-by: Kyle Mistele <kyle@constellate.ai>
|
2024-09-04 13:18:13 -07:00 |
|
Cyrus Leung
|
855c262a6b
|
[Frontend] Multimodal support in offline chat (#8098)
|
2024-09-04 05:22:17 +00:00 |
|
Roger Wang
|
5231f0898e
|
[Frontend][VLM] Add support for multiple multi-modal items (#8049)
|
2024-08-31 16:35:53 -07:00 |
|
Patrick von Platen
|
6fc4e6e07a
|
[Model] Add Mistral Tokenization to improve robustness and chat encoding (#7739)
|
2024-08-27 12:40:02 +00:00 |
|
Cyrus Leung
|
baaedfdb2d
|
[mypy] Enable following imports for entrypoints (#7248)
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Co-authored-by: Fei <dfdfcai4@gmail.com>
|
2024-08-20 23:28:21 -07:00 |
|
Grant Pinkert
|
f878c8feb0
|
[Feature]: Add OpenAI server prompt_logprobs support #6508 (#7453)
|
2024-08-16 02:38:08 +00:00 |
|
Cyrus Leung
|
66d617e343
|
[Frontend] Gracefully handle missing chat template and fix CI failure (#7238)
Co-authored-by: Roger Wang <ywang@roblox.com>
|
2024-08-07 09:12:05 +00:00 |
|
Nick Hill
|
9a3f49ae07
|
[BugFix] Overhaul async request cancellation (#7111)
|
2024-08-07 13:21:41 +08:00 |
|
Cyrus Leung
|
8c025fa703
|
[Frontend] Factor out chat message parsing (#7055)
|
2024-08-02 21:31:27 -07:00 |
|
Robert Shaw
|
ed812a73fa
|
[ Frontend ] Multiprocessing for OpenAI Server with zeromq (#6883)
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
Co-authored-by: Joe Runde <Joseph.Runde@ibm.com>
Co-authored-by: Joe Runde <joe@joerun.de>
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
Co-authored-by: Simon Mo <simon.mo@hey.com>
|
2024-08-02 18:27:28 -07:00 |
|
zifeitong
|
3c10591ef2
|
[Bugfix] Set SamplingParams.max_tokens for OpenAI requests if not provided by user (#6954)
|
2024-07-31 21:13:34 -07:00 |
|
Nick Hill
|
9f69d8245a
|
[Frontend] New allowed_token_ids decoding request parameter (#6753)
|
2024-07-29 23:37:27 +00:00 |
|
Evan Z. Liu
|
5689e256ba
|
[Frontend] Represent tokens with identifiable strings (#6626)
|
2024-07-25 09:51:00 +08:00 |
|
Yehoshua Cohen
|
58f53034ad
|
[Frontend] Add Usage data in each chunk for chat_serving. #6540 (#6652)
|
2024-07-23 11:41:55 -07:00 |
|
Cyrus Leung
|
739b61a348
|
[Frontend] Refactor prompt processing (#4028)
Co-authored-by: Roger Wang <ywang@roblox.com>
|
2024-07-22 10:13:53 -07:00 |
|
Cyrus Leung
|
d7f4178dd9
|
[Frontend] Move chat utils (#6602)
Co-authored-by: Roger Wang <ywang@roblox.com>
|
2024-07-21 08:38:17 +08:00 |
|
Nick Hill
|
e2fbaee725
|
[BugFix][Frontend] Use LoRA tokenizer in OpenAI APIs (#6227)
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2024-07-18 15:13:30 +08:00 |
|