Commit Graph

96 Commits

Author SHA1 Message Date
Patrick von Platen
11cd1ae6ad [Tool parsing] Improve / correct mistral tool parsing (#10333) 2024-11-15 00:42:49 +00:00
Guillaume Calmettes
52b48c1ead [BugFix]: properly deserialize tool_calls iterator before processing by mistral-common when MistralTokenizer is used (#9951)
Signed-off-by: Guillaume Calmettes <gcalmettes@scaleway.com>
2024-11-14 04:48:16 +00:00
Mike Depinet
f67ce05d0b [Frontend] Pythonic tool parser (#9859)
Signed-off-by: Mike Depinet <mike@fixie.ai>
2024-11-14 04:14:34 +00:00
Cyrus Leung
0b8bb86bf1 [1/N] Initial prototype for multi-modal processor (#10044)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-11-13 12:39:03 +00:00
zifeitong
47db6ec831 [Frontend] Add per-request number of cached token stats (#10174) 2024-11-12 16:42:28 +00:00
Cyrus Leung
06386a64dd [Frontend] Chat-based Embeddings API (#9759) 2024-11-01 08:13:35 +00:00
Zhong Qishuai
ef7865b4f9 [Frontend] re-enable multi-modality input in the new beam search implementation (#9427)
Signed-off-by: Qishuai Ferdinandzhong@gmail.com
2024-10-29 11:49:47 +00:00
Vinay R Damodaran
33bab41060 [Bugfix]: Make chat content text allow type content (#9358)
Signed-off-by: Vinay Damodaran <vrdn@hey.com>
2024-10-24 05:05:49 +00:00
Yuhong Guo
434984e665 [Frontend] Support custom request_id from request (#9550)
Co-authored-by: Yuhong Guo <yuhong.gyh@antgroup.com>
2024-10-22 18:07:30 +00:00
Cyrus Leung
390be74649 [Misc] Print stack trace using logger.exception (#9461) 2024-10-17 13:55:48 +00:00
Chang Su
ba30942240 [Bugfix] Fix vLLM UsageInfo and logprobs None AssertionError with empty token_ids (#9034)
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
2024-10-15 15:40:43 -07:00
Nick Hill
e9d517f276 [BugFix] Fix chat API continuous usage stats (#9357) 2024-10-14 23:19:48 -07:00
Brendan Wong
4d31cd424b [Frontend] merge beam search implementations (#9296) 2024-10-14 15:05:52 -07:00
Maximilien de Bayser
ec10cb8511 [BugFix] Fix tool call finish reason in streaming case (#9209)
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
2024-10-11 18:24:26 -07:00
Brendan Wong
8c746226c9 [Frontend] API support for beam search for MQLLMEngine (#9117) 2024-10-08 05:51:43 +00:00
Yanyi Liu
fdf59d30ea [Bugfix] fix tool_parser error handling when serve a model not support it (#8709) 2024-10-06 12:51:08 +00:00
Brendan Wong
168cab6bbf [Frontend] API support for beam search (#9087)
Co-authored-by: youkaichao <youkaichao@126.com>
2024-10-05 23:39:03 -07:00
代君
3dbb215b38 [Frontend][Feature] support tool calling for internlm/internlm2_5-7b-chat model (#8405) 2024-10-04 10:36:39 +08:00
Sebastian Schoennenbeck
35bd215168 [Core] [Frontend] Priority scheduling for embeddings and in the OpenAI-API (#8965) 2024-10-01 09:58:06 +00:00
Joe Runde
062c89e7c9 [Frontend][Core] Move guided decoding params into sampling params (#8252)
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
2024-10-01 09:34:25 +08:00
danieljannai21
6c9ba48fde [Frontend] Added support for HF's new continue_final_message parameter (#8942) 2024-09-29 17:59:47 +00:00
Maximilien de Bayser
344cd2b6f4 [Feature] Add support for Llama 3.1 and 3.2 tool use (#8343)
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
2024-09-26 17:01:42 -07:00
Nick Hill
4b377d6feb [BugFix] Fix test breakages from transformers 4.45 upgrade (#8829) 2024-09-26 16:46:43 -07:00
Pernekhan Utemuratov
93d364da34 [Bugfix] Include encoder prompts len to non-stream api usage response (#8861) 2024-09-26 15:47:00 -07:00
Chen Zhang
770ec6024f [Model] Add support for the multi-modal Llama 3.2 model (#8811)
Co-authored-by: simon-mo <xmo@berkeley.edu>
Co-authored-by: Chang Su <chang.s.su@oracle.com>
Co-authored-by: Simon Mo <simon.mo@hey.com>
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
2024-09-25 13:29:32 -07:00
Adam Tilghman
1ac3de09cd [Frontend] OpenAI server: propagate usage accounting to FastAPI middleware layer (#8672) 2024-09-25 07:49:26 +00:00
Jiaxin Shan
260d40b5ea [Core] Support Lora lineage and base model metadata management (#6315) 2024-09-20 06:20:56 +00:00
Alexander Matveev
7c7714d856 [Core][Bugfix][Perf] Introduce MQLLMEngine to avoid asyncio OH (#8157)
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
Co-authored-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-neuralmagic@users.noreply.github.com>
Co-authored-by: Simon Mo <simon.mo@hey.com>
2024-09-18 13:56:58 +00:00
Patrick von Platen
a54ed80249 [Model] Add mistral function calling format to all models loaded with "mistral" format (#8515)
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2024-09-17 17:50:37 +00:00
Nick Hill
551ce01078 [Core] Add engine option to return only deltas or final output (#7381) 2024-09-12 12:02:00 -07:00
tomeras91
5a60699c45 [Bugfix]: Fix the logic for deciding if tool parsing is used (#8366) 2024-09-12 03:55:30 +00:00
Cyrus Leung
8c054b7a62 [Frontend] Clean up type annotations for mistral tokenizer (#8314) 2024-09-10 16:49:11 +00:00
Kyle Mistele
08287ef675 [Bugfix] Streamed tool calls now more strictly follow OpenAI's format; ensures Vercel AI SDK compatibility (#8272) 2024-09-09 10:45:11 -04:00
Kyle Mistele
e02ce498be [Feature] OpenAI-Compatible Tools API + Streaming for Hermes & Mistral models (#5649)
Co-authored-by: constellate <constellate@1-ai-appserver-staging.codereach.com>
Co-authored-by: Kyle Mistele <kyle@constellate.ai>
2024-09-04 13:18:13 -07:00
Cyrus Leung
855c262a6b [Frontend] Multimodal support in offline chat (#8098) 2024-09-04 05:22:17 +00:00
Roger Wang
5231f0898e [Frontend][VLM] Add support for multiple multi-modal items (#8049) 2024-08-31 16:35:53 -07:00
Patrick von Platen
6fc4e6e07a [Model] Add Mistral Tokenization to improve robustness and chat encoding (#7739) 2024-08-27 12:40:02 +00:00
Cyrus Leung
baaedfdb2d [mypy] Enable following imports for entrypoints (#7248)
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Co-authored-by: Fei <dfdfcai4@gmail.com>
2024-08-20 23:28:21 -07:00
Grant Pinkert
f878c8feb0 [Feature]: Add OpenAI server prompt_logprobs support #6508 (#7453) 2024-08-16 02:38:08 +00:00
Cyrus Leung
66d617e343 [Frontend] Gracefully handle missing chat template and fix CI failure (#7238)
Co-authored-by: Roger Wang <ywang@roblox.com>
2024-08-07 09:12:05 +00:00
Nick Hill
9a3f49ae07 [BugFix] Overhaul async request cancellation (#7111) 2024-08-07 13:21:41 +08:00
Cyrus Leung
8c025fa703 [Frontend] Factor out chat message parsing (#7055) 2024-08-02 21:31:27 -07:00
Robert Shaw
ed812a73fa [ Frontend ] Multiprocessing for OpenAI Server with zeromq (#6883)
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
Co-authored-by: Joe Runde <Joseph.Runde@ibm.com>
Co-authored-by: Joe Runde <joe@joerun.de>
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
Co-authored-by: Simon Mo <simon.mo@hey.com>
2024-08-02 18:27:28 -07:00
zifeitong
3c10591ef2 [Bugfix] Set SamplingParams.max_tokens for OpenAI requests if not provided by user (#6954) 2024-07-31 21:13:34 -07:00
Nick Hill
9f69d8245a [Frontend] New allowed_token_ids decoding request parameter (#6753) 2024-07-29 23:37:27 +00:00
Evan Z. Liu
5689e256ba [Frontend] Represent tokens with identifiable strings (#6626) 2024-07-25 09:51:00 +08:00
Yehoshua Cohen
58f53034ad [Frontend] Add Usage data in each chunk for chat_serving. #6540 (#6652) 2024-07-23 11:41:55 -07:00
Cyrus Leung
739b61a348 [Frontend] Refactor prompt processing (#4028)
Co-authored-by: Roger Wang <ywang@roblox.com>
2024-07-22 10:13:53 -07:00
Cyrus Leung
d7f4178dd9 [Frontend] Move chat utils (#6602)
Co-authored-by: Roger Wang <ywang@roblox.com>
2024-07-21 08:38:17 +08:00
Nick Hill
e2fbaee725 [BugFix][Frontend] Use LoRA tokenizer in OpenAI APIs (#6227)
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2024-07-18 15:13:30 +08:00