Vincent
a4f1ee35d6
Deprecate best_of Sampling Parameter in anticipation for vLLM V1 ( #13997 )
...
Signed-off-by: vincent-4 <vincentzhongy+githubvincent4@gmail.com >
Signed-off-by: Brayden Zhong <b8zhong@uwaterloo.ca >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Brayden Zhong <b8zhong@uwaterloo.ca >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-03-05 20:22:43 +00:00
Benjamin Chislett
32985bed7c
[Frontend] Allow return_tokens_as_token_ids to be passed as a request param ( #14066 )
...
Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai >
2025-03-05 06:30:40 +00:00
Harry Mellor
e5b2f1601a
[Frontend] Do prompt_logprobs clamping for chat as well as completions ( #14225 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-03-04 20:13:06 +00:00
Harry Mellor
9badee53de
Fix performance when --generation-config is not None ( #14223 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-03-04 20:59:22 +01:00
Harry Mellor
cf069aa8aa
Update deprecated Python 3.8 typing ( #13971 )
2025-03-02 17:34:51 -08:00
Keyun Tong
0ffdf8ce0c
[HTTP Server] Make model param optional in request ( #13568 )
2025-02-21 21:55:50 -08:00
Russell Bryant
e489ad7a21
[Misc] Add SPDX-License-Identifier headers to python source files ( #12628 )
...
- **Add SPDX license headers to python source files**
- **Check for SPDX headers using pre-commit**
commit 9d7ef44c3cfb72ca4c32e1c677d99259d10d4745
Author: Russell Bryant <rbryant@redhat.com >
Date: Fri Jan 31 14:18:24 2025 -0500
Add SPDX license headers to python source files
This commit adds SPDX license headers to python source files as
recommended to
the project by the Linux Foundation. These headers provide a concise way
that is
both human and machine readable for communicating license information
for each
source file. It helps avoid any ambiguity about the license of the code
and can
also be easily used by tools to help manage license compliance.
The Linux Foundation runs license scans against the codebase to help
ensure
we are in compliance with the licenses of the code we use, including
dependencies. Having these headers in place helps that tool do its job.
More information can be found on the SPDX site:
- https://spdx.dev/learn/handling-license-info/
Signed-off-by: Russell Bryant <rbryant@redhat.com >
commit 5a1cf1cb3b80759131c73f6a9dddebccac039dea
Author: Russell Bryant <rbryant@redhat.com >
Date: Fri Jan 31 14:36:32 2025 -0500
Check for SPDX headers using pre-commit
Signed-off-by: Russell Bryant <rbryant@redhat.com >
---------
Signed-off-by: Russell Bryant <rbryant@redhat.com >
2025-02-02 11:58:18 -08:00
Sebastian Schoennenbeck
2079e43bee
[Core] Make raw_request optional in ServingCompletion ( #12503 )
...
Signed-off-by: Sebastian Schönnenbeck <sebastian.schoennenbeck@comma-soft.com >
2025-01-28 10:56:45 +00:00
Harry Mellor
823ab79633
Update pre-commit hooks ( #12475 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-01-27 17:23:08 -07:00
Robert Shaw
33fc1e2e86
[Frontend] Improve StreamingResponse Exception Handling ( #11752 )
2025-01-05 16:35:01 -05:00
Joe Runde
4db72e57f6
[Bugfix][Refactor] Unify model management in frontend ( #11660 )
...
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com >
2025-01-01 02:21:51 +00:00
Yanyi Liu
5aef49806d
[Feature] Add load generation config from model ( #11164 )
...
Signed-off-by: liuyanyi <wolfsonliu@163.com >
Signed-off-by: Yanyi Liu <wolfsonliu@163.com >
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2024-12-19 10:50:38 +00:00
Joe Runde
2d1b9baa8f
[Bugfix] Fix request cancellation without polling ( #11190 )
Create Release / Create Release (push) Has been cancelled
Create Release / Build Wheel (11.8, ubuntu-20.04, 3.10, 2.4.0) (push) Has been cancelled
Create Release / Build Wheel (11.8, ubuntu-20.04, 3.11, 2.4.0) (push) Has been cancelled
Create Release / Build Wheel (11.8, ubuntu-20.04, 3.12, 2.4.0) (push) Has been cancelled
Create Release / Build Wheel (11.8, ubuntu-20.04, 3.9, 2.4.0) (push) Has been cancelled
Create Release / Build Wheel (12.1, ubuntu-20.04, 3.10, 2.4.0) (push) Has been cancelled
Create Release / Build Wheel (12.1, ubuntu-20.04, 3.11, 2.4.0) (push) Has been cancelled
Create Release / Build Wheel (12.1, ubuntu-20.04, 3.12, 2.4.0) (push) Has been cancelled
Create Release / Build Wheel (12.1, ubuntu-20.04, 3.9, 2.4.0) (push) Has been cancelled
2024-12-17 12:26:32 -08:00
Brad Hilton
9c3dadd1c9
[Frontend] Add logits_processors as an extra completion argument ( #11150 )
...
Signed-off-by: Brad Hilton <brad.hilton.nw@gmail.com >
2024-12-14 16:46:42 +00:00
Jiaxin Shan
85362f028c
[Misc][LoRA] Ensure Lora Adapter requests return adapter name ( #11094 )
...
Signed-off-by: Jiaxin Shan <seedjeffwan@gmail.com >
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com >
2024-12-12 09:25:16 +00:00
Rafael Vasquez
40766ca1b8
[Bugfix]: Clamp -inf logprob values in prompt_logprobs ( #11073 )
...
Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com >
2024-12-11 01:27:39 -08:00
Joe Runde
980ad394a8
[Frontend] Use request id from header ( #10968 )
...
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com >
2024-12-10 13:46:29 +08:00
tomeras91
395b1c7454
[Frontend] don't block event loop in tokenization (preprocess) in OpenAI compatible server ( #10635 )
...
Signed-off-by: Tomer Asida <tomera@ai21.com >
2024-11-27 13:21:10 -08:00
Cyrus Leung
0b8bb86bf1
[1/N] Initial prototype for multi-modal processor ( #10044 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2024-11-13 12:39:03 +00:00
Nick Hill
29862b884b
[Frontend] Adjust try/except blocks in API impl ( #10056 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2024-11-06 20:07:51 -08:00
Cyrus Leung
06386a64dd
[Frontend] Chat-based Embeddings API ( #9759 )
2024-11-01 08:13:35 +00:00
Zhong Qishuai
ef7865b4f9
[Frontend] re-enable multi-modality input in the new beam search implementation ( #9427 )
...
Signed-off-by: Qishuai Ferdinandzhong@gmail.com
2024-10-29 11:49:47 +00:00
Nick Hill
25aeb7d4c9
[BugFix] Fix and simplify completion API usage streaming ( #9475 )
2024-10-18 14:10:26 +00:00
Chang Su
ba30942240
[Bugfix] Fix vLLM UsageInfo and logprobs None AssertionError with empty token_ids ( #9034 )
...
Co-authored-by: Nick Hill <nickhill@us.ibm.com >
2024-10-15 15:40:43 -07:00
Brendan Wong
4d31cd424b
[Frontend] merge beam search implementations ( #9296 )
2024-10-14 15:05:52 -07:00
Brendan Wong
8c746226c9
[Frontend] API support for beam search for MQLLMEngine ( #9117 )
2024-10-08 05:51:43 +00:00
Brendan Wong
168cab6bbf
[Frontend] API support for beam search ( #9087 )
...
Co-authored-by: youkaichao <youkaichao@126.com >
2024-10-05 23:39:03 -07:00
Sebastian Schoennenbeck
35bd215168
[Core] [Frontend] Priority scheduling for embeddings and in the OpenAI-API ( #8965 )
2024-10-01 09:58:06 +00:00
Joe Runde
062c89e7c9
[Frontend][Core] Move guided decoding params into sampling params ( #8252 )
...
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com >
Co-authored-by: Nick Hill <nickhill@us.ibm.com >
2024-10-01 09:34:25 +08:00
Adam Tilghman
1ac3de09cd
[Frontend] OpenAI server: propagate usage accounting to FastAPI middleware layer ( #8672 )
2024-09-25 07:49:26 +00:00
Jiaxin Shan
260d40b5ea
[Core] Support Lora lineage and base model metadata management ( #6315 )
2024-09-20 06:20:56 +00:00
Alexander Matveev
7c7714d856
[Core][Bugfix][Perf] Introduce MQLLMEngine to avoid asyncio OH ( #8157 )
...
Co-authored-by: Nick Hill <nickhill@us.ibm.com >
Co-authored-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com >
Co-authored-by: Robert Shaw <114415538+robertgshaw2-neuralmagic@users.noreply.github.com >
Co-authored-by: Simon Mo <simon.mo@hey.com >
2024-09-18 13:56:58 +00:00
Nick Hill
551ce01078
[Core] Add engine option to return only deltas or final output ( #7381 )
2024-09-12 12:02:00 -07:00
Cyrus Leung
baaedfdb2d
[mypy] Enable following imports for entrypoints ( #7248 )
...
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
Co-authored-by: Fei <dfdfcai4@gmail.com >
2024-08-20 23:28:21 -07:00
Grant Pinkert
f878c8feb0
[Feature]: Add OpenAI server prompt_logprobs support #6508 ( #7453 )
2024-08-16 02:38:08 +00:00
Nick Hill
9a3f49ae07
[BugFix] Overhaul async request cancellation ( #7111 )
2024-08-07 13:21:41 +08:00
Robert Shaw
ed812a73fa
[ Frontend ] Multiprocessing for OpenAI Server with zeromq ( #6883 )
...
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com >
Co-authored-by: Joe Runde <Joseph.Runde@ibm.com >
Co-authored-by: Joe Runde <joe@joerun.de >
Co-authored-by: Nick Hill <nickhill@us.ibm.com >
Co-authored-by: Simon Mo <simon.mo@hey.com >
2024-08-02 18:27:28 -07:00
zifeitong
3c10591ef2
[Bugfix] Set SamplingParams.max_tokens for OpenAI requests if not provided by user ( #6954 )
2024-07-31 21:13:34 -07:00
Nick Hill
9f69d8245a
[Frontend] New allowed_token_ids decoding request parameter ( #6753 )
2024-07-29 23:37:27 +00:00
Evan Z. Liu
5689e256ba
[Frontend] Represent tokens with identifiable strings ( #6626 )
2024-07-25 09:51:00 +08:00
Cyrus Leung
739b61a348
[Frontend] Refactor prompt processing ( #4028 )
...
Co-authored-by: Roger Wang <ywang@roblox.com >
2024-07-22 10:13:53 -07:00
Nick Hill
e2fbaee725
[BugFix][Frontend] Use LoRA tokenizer in OpenAI APIs ( #6227 )
...
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
2024-07-18 15:13:30 +08:00
sasha0552
7a3d2a5b95
[Frontend] Support for chat completions input in the tokenize endpoint ( #5923 )
2024-07-16 20:18:09 +08:00
Swapnil Parekh
4d6ada947c
[CORE] Adding support for insertion of soft-tuned prompts ( #4645 )
...
Co-authored-by: Swapnil Parekh <swapnilp@ibm.com >
Co-authored-by: Joe G <joseph.granados@h2o.ai >
Co-authored-by: Antoni Baum <antoni.baum@protonmail.com >
2024-07-09 13:26:36 -07:00
kczimm
16620f439d
do not exclude object field in CompletionStreamResponse ( #6196 )
2024-07-08 10:32:57 +08:00
jvlunteren
f1e15da6fe
[Frontend] Continuous usage stats in OpenAI completion API ( #5742 )
2024-07-05 10:37:09 -07:00
Alexander Matveev
3476ed0809
[Core] Optimize block_manager_v2 vs block_manager_v1 (to make V2 default) ( #5602 )
2024-07-01 20:10:37 -07:00
sasha0552
c54269d967
[Frontend] Add tokenize/detokenize endpoints ( #5054 )
2024-06-26 16:54:22 +00:00
Ronen Schaffer
7879f24dcc
[Misc] Add OpenTelemetry support ( #4687 )
...
This PR adds basic support for OpenTelemetry distributed tracing.
It includes changes to enable tracing functionality and improve monitoring capabilities.
I've also added a markdown with print-screens to guide users how to use this feature. You can find it here
2024-06-19 01:17:03 +09:00
Cyrus Leung
640052b069
[Bugfix][Frontend] Cleanup "fix chat logprobs" ( #5026 )
2024-06-10 22:36:46 -07:00