Gabriel Marinho
1c2bc7ead0
Truncation control for embedding models ( #14776 )
...
Signed-off-by: Gabriel Marinho <gmarinho@ibm.com >
Signed-off-by: Max de Bayser <mbayser@br.ibm.com >
Co-authored-by: Max de Bayser <mbayser@br.ibm.com >
2025-04-30 09:24:57 +08:00
Nick Hill
df6f3ce883
[Core] Remove prompt string from engine core data structures ( #17214 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-04-25 23:41:05 -07:00
Zijing Liu
53e8cf53a4
[V1][Metrics] Allow V1 AsyncLLM to use custom logger ( #14661 )
...
Signed-off-by: Zijing Liu <liuzijing2014@gmail.com >
Signed-off-by: Mark McLoughlin <markmc@redhat.com >
Signed-off-by: Nick Hill <nhill@redhat.com >
Co-authored-by: Mark McLoughlin <markmc@redhat.com >
Co-authored-by: Nick Hill <nhill@redhat.com >
2025-04-25 22:05:40 -07:00
Daniel Li
48cb2109b6
[V1] Move usage stats to worker and start logging TPU hardware ( #16211 )
2025-04-25 14:06:01 -06:00
Yinghai Lu
fe92176321
Add collective_rpc to llm engine ( #16999 )
...
Signed-off-by: Yinghai Lu <yinghai@thinkingmachines.ai >
2025-04-24 20:16:52 +00:00
Harry Mellor
0a05ed57e6
Simplify TokenizerGroup ( #16790 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-04-24 04:43:56 -07:00
Michael Goin
ed50f46641
[Bugfix] Enable V1 usage stats ( #16986 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
Signed-off-by: Nick Hill <nhill@redhat.com >
Co-authored-by: Nick Hill <nhill@redhat.com >
2025-04-23 19:54:00 -07:00
Robert Shaw
2b05b8ce69
[V1][Frontend] Improve Shutdown And Logs ( #11737 )
...
Signed-off-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com >
Signed-off-by: Andrew Feldman <afeldman@neuralmagic.com >
Signed-off-by: Nick Hill <nhill@redhat.com >
Co-authored-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
Co-authored-by: Russell Bryant <rbryant@redhat.com >
Co-authored-by: Andrew Feldman <afeldman@neuralmagic.com >
Co-authored-by: afeldman-nm <156691304+afeldman-nm@users.noreply.github.com >
Co-authored-by: Nick Hill <nhill@redhat.com >
2025-04-16 19:48:34 -07:00
Xihui Cang
1666e66443
Add "/server_info" endpoint in api_server to retrieve the vllm_config. ( #16572 )
...
Signed-off-by: Xihui Cang <xihuicang@gmail.com >
2025-04-15 11:50:38 +00:00
Eric Tang
ddb94c2605
[core] Add tags parameter to wake_up() ( #15500 )
...
Signed-off-by: Eric <erictang000@gmail.com >
2025-04-02 01:59:27 -07:00
Cyrus Leung
355f66348c
[V1] Remove legacy input registry ( #15673 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-03-27 23:34:34 -07:00
Nick Hill
15dac210f0
[V1] AsyncLLM data parallel ( #13923 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-03-27 16:14:41 -07:00
Nick Hill
9d72daf4ce
[V1][Perf] Simpler request output queues ( #15156 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com >
Co-authored-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com >
2025-03-24 22:44:08 +00:00
Nick Hill
da6ea29f7a
[V1] Avoid redundant input processing in n>1 case ( #14985 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-03-20 22:24:10 -07:00
maobaolong
26dd972adb
[FEAT]Support reset prefix cache by specified device ( #15003 )
2025-03-19 10:54:41 -07:00
Jun Duan
74bc397b0a
[Core] Expose API endpoint /is_sleeping ( #14312 )
...
Signed-off-by: Jun Duan <jun.duan.phd@outlook.com >
2025-03-15 06:28:14 -07:00
Robert Shaw
d4d93db2c5
[V1] V1 Enablement Oracle ( #13726 )
...
Signed-off-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com >
Co-authored-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com >
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com >
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com >
Co-authored-by: Michael Goin <michael@neuralmagic.com >
2025-03-14 22:02:20 -07:00
Nick Hill
f5d3acd474
[BugFix][V1] Fix parallel sampling finishing/aborts ( #14512 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-03-12 10:29:48 -07:00
Nick Hill
8ed5421aaa
[V1] Eagerly remove finished requests from the batch ( #14388 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-03-07 10:56:00 -08:00
Aaron Pham
80e9afb5bc
[V1][Core] Support for Structured Outputs ( #12388 )
...
Signed-off-by: Aaron Pham <contact@aarnphm.xyz >
Signed-off-by: Russell Bryant <rbryant@redhat.com >
Co-authored-by: Russell Bryant <rbryant@redhat.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
Co-authored-by: Nick Hill <nhill@redhat.com >
2025-03-07 07:19:11 -08:00
Nick Hill
872db2be0e
[V1] Simplify stats logging ( #14082 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-03-03 10:34:14 -08:00
Mark McLoughlin
4167252eaf
[V1] Refactor parallel sampling support ( #13774 )
...
Signed-off-by: Mark McLoughlin <markmc@redhat.com >
2025-03-03 08:15:27 -08:00
Harry Mellor
cf069aa8aa
Update deprecated Python 3.8 typing ( #13971 )
2025-03-02 17:34:51 -08:00
Varun Sundar Rabindranath
03f48b3db6
[Core] LoRA V1 - Add add/pin/list/remove_lora functions ( #13705 )
2025-02-25 00:18:02 -08:00
afeldman-nm
befc402d34
[V1] V1 engine implements parallel sampling (AsyncLLM and LLMEngine) ( #10980 )
...
Signed-off-by: Andrew Feldman <afeldman@neuralmagic.com >
Co-authored-by: Nick Hill <nhill@redhat.com >
2025-02-24 08:29:41 -08:00
youkaichao
ba81163997
[core] add sleep and wake up endpoint and v1 support ( #12987 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com >
Signed-off-by: cennn <2523403608@qq.com >
Co-authored-by: cennn <2523403608@qq.com >
2025-02-20 12:41:17 +08:00
Mark McLoughlin
2ad1bc7afe
[V1][Metrics] Add iteration_tokens_total histogram from V0 ( #13288 )
2025-02-15 03:56:19 -08:00
Varun Sundar Rabindranath
cbc40128eb
[V1] LoRA - Enable Serving Usecase ( #12883 )
...
Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com >
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com >
2025-02-14 14:21:12 +08:00
Mark McLoughlin
75e6e14516
[V1][Metrics] Add several request timing histograms ( #12644 )
...
Signed-off-by: Mark McLoughlin <markmc@redhat.com >
2025-02-11 10:14:00 -05:00
Russell Bryant
e489ad7a21
[Misc] Add SPDX-License-Identifier headers to python source files ( #12628 )
...
- **Add SPDX license headers to python source files**
- **Check for SPDX headers using pre-commit**
commit 9d7ef44c3cfb72ca4c32e1c677d99259d10d4745
Author: Russell Bryant <rbryant@redhat.com >
Date: Fri Jan 31 14:18:24 2025 -0500
Add SPDX license headers to python source files
This commit adds SPDX license headers to python source files as
recommended to
the project by the Linux Foundation. These headers provide a concise way
that is
both human and machine readable for communicating license information
for each
source file. It helps avoid any ambiguity about the license of the code
and can
also be easily used by tools to help manage license compliance.
The Linux Foundation runs license scans against the codebase to help
ensure
we are in compliance with the licenses of the code we use, including
dependencies. Having these headers in place helps that tool do its job.
More information can be found on the SPDX site:
- https://spdx.dev/learn/handling-license-info/
Signed-off-by: Russell Bryant <rbryant@redhat.com >
commit 5a1cf1cb3b80759131c73f6a9dddebccac039dea
Author: Russell Bryant <rbryant@redhat.com >
Date: Fri Jan 31 14:36:32 2025 -0500
Check for SPDX headers using pre-commit
Signed-off-by: Russell Bryant <rbryant@redhat.com >
---------
Signed-off-by: Russell Bryant <rbryant@redhat.com >
2025-02-02 11:58:18 -08:00
Mark McLoughlin
c386c43ca3
[V1][Metrics] Add per-request prompt/generation_tokens histograms ( #12516 )
...
Signed-off-by: Mark McLoughlin <markmc@redhat.com >
2025-01-28 22:07:22 +00:00
Mark McLoughlin
3fd1fb63ef
[V1][Metrics] Hook up IterationStats for Prometheus metrics ( #12478 )
...
Signed-off-by: Mark McLoughlin <markmc@redhat.com >
2025-01-28 16:38:38 +00:00
Mark McLoughlin
01ba927040
[V1][Metrics] Add initial Prometheus logger ( #12416 )
...
Signed-off-by: Mark McLoughlin <markmc@redhat.com >
2025-01-27 12:26:28 -05:00
Nick Hill
24b0205f58
[V1][Frontend] Coalesce bunched RequestOutputs ( #12298 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
Co-authored-by: Robert Shaw <rshaw@neuralmagic.com >
2025-01-23 17:17:41 -08:00
Nick Hill
aea94362c9
[Frontend][V1] Online serving performance improvements ( #12287 )
2025-01-22 22:22:12 +00:00
Cody Yu
7206ce4ce1
[Core] Support reset_prefix_cache ( #12284 )
2025-01-22 18:52:27 +00:00
Robert Shaw
619ae268c3
[V1] [2/n] Logging and Metrics - OutputProcessor Abstraction ( #11973 )
...
Signed-off-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com >
2025-01-13 04:54:10 +00:00
Robert Shaw
9597a095f2
[V1][Core][1/n] Logging and Metrics ( #11962 )
...
Signed-off-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com >
2025-01-12 21:02:02 +00:00
Joe Runde
ac2f3f7fee
[Bugfix] Validate lora adapters to avoid crashing server ( #11727 )
...
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com >
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com >
2025-01-10 15:56:36 +08:00
Rui Qiao
022c5c6944
[V1] Refactor get_executor_cls ( #11754 )
2025-01-06 07:59:16 +00:00
Kunshang Ji
fbf2564554
[V1] Add RayExecutor support for AsyncLLM (api server) ( #11712 )
2025-01-04 06:41:31 +00:00
Robert Shaw
1543914c04
[V1] Improve TP>1 Error Handling + Stack Trace ( #11721 )
...
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com >
2025-01-03 21:29:11 +00:00
Robert Shaw
80c751e7f6
[V1] Simplify Shutdown ( #11659 )
2025-01-03 17:25:38 +00:00
Robert Shaw
5886aa496e
[V1] [6/N] API Server: Better Shutdown ( #11586 )
2024-12-30 15:51:02 +00:00
Robert Shaw
4fb8e329fd
[V1] [5/N] API Server: unify Detokenizer and EngineCore input ( #11545 )
...
Signed-off-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com >
2024-12-28 20:51:57 +00:00
Robert Shaw
df04dffade
[V1] [4/N] API Server: ZMQ/MP Utilities ( #11541 )
2024-12-28 01:45:08 +00:00
Robert Shaw
1b875a0ef3
[V1][3/N] API Server: Reduce Task Switching + Handle Abort Properly ( #11534 )
2024-12-26 21:19:21 -08:00
Ricky Xu
584f0ae40d
[V1] Make AsyncLLMEngine v1-v0 opaque ( #11383 )
...
Signed-off-by: Ricky Xu <xuchen727@hotmail.com >
2024-12-21 15:14:08 +08:00
Cody Yu
bf8717ebae
[V1] Prefix caching for vision language models ( #11187 )
...
Signed-off-by: Cody Yu <hao.yu.cody@gmail.com >
2024-12-17 16:37:59 -08:00
Mark McLoughlin
6d917d0eeb
Enable mypy checking on V1 code ( #11105 )
...
Signed-off-by: Mark McLoughlin <markmc@redhat.com >
2024-12-14 09:54:04 -08:00