Xu Song
067fa2255b
[Bugfix]Fix search start_index of stop_checker ( #13280 )
2025-02-14 21:39:42 -08:00
Joe Runde
3bcb8c75da
[Core] Reduce TTFT with concurrent partial prefills ( #10235 )
...
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com >
Signed-off-by: Prashant Gupta <prashantgupta@us.ibm.com >
Co-authored-by: Prashant Gupta <prashantgupta@us.ibm.com >
Co-authored-by: Cody Yu <hao.yu.cody@gmail.com >
2025-02-14 15:36:07 -08:00
Keyun Tong
3ee696a63d
[RFC][vllm-API] Support tokenizer registry for customized tokenizer in vLLM ( #12518 )
...
Signed-off-by: Keyun Tong <tongkeyun@gmail.com >
2025-02-12 12:25:58 +08:00
Jewon Lee
bf3e05215c
[Misc] Fix typo at comments at metrics.py ( #13024 )
2025-02-11 08:20:37 -08:00
wangxiyuan
2e3b969ec0
[Platform] add pre_register_and_update function ( #12432 )
...
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com >
2025-02-11 22:06:46 +08:00
youkaichao
91dd8f7aa6
[bugfix] respect distributed_executor_backend in world_size=1 ( #12934 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com >
2025-02-08 16:17:08 +08:00
Arthur
a1a2aaadb9
[Model]: Add transformers backend support ( #11330 )
...
# Adds support for `transformers` as a backend
Following https://github.com/huggingface/transformers/pull/35235 , a
bunch of models should already be supported, we are ramping up support
for more models.
Thanks @Isotr0py for the TP support, and @hmellor for his help as well!
This includes:
- `trust_remote_code=True` support: any model on the hub, if it
implements attention the correct way can be natively supported!!
- tensor parallel support
---------
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Signed-off-by: Isotr0py <2037008807@qq.com >
Co-authored-by: Isotr0py <41363108+Isotr0py@users.noreply.github.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Isotr0py <2037008807@qq.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-02-03 21:30:38 +08:00
Russell Bryant
e489ad7a21
[Misc] Add SPDX-License-Identifier headers to python source files ( #12628 )
...
- **Add SPDX license headers to python source files**
- **Check for SPDX headers using pre-commit**
commit 9d7ef44c3cfb72ca4c32e1c677d99259d10d4745
Author: Russell Bryant <rbryant@redhat.com >
Date: Fri Jan 31 14:18:24 2025 -0500
Add SPDX license headers to python source files
This commit adds SPDX license headers to python source files as
recommended to
the project by the Linux Foundation. These headers provide a concise way
that is
both human and machine readable for communicating license information
for each
source file. It helps avoid any ambiguity about the license of the code
and can
also be easily used by tools to help manage license compliance.
The Linux Foundation runs license scans against the codebase to help
ensure
we are in compliance with the licenses of the code we use, including
dependencies. Having these headers in place helps that tool do its job.
More information can be found on the SPDX site:
- https://spdx.dev/learn/handling-license-info/
Signed-off-by: Russell Bryant <rbryant@redhat.com >
commit 5a1cf1cb3b80759131c73f6a9dddebccac039dea
Author: Russell Bryant <rbryant@redhat.com >
Date: Fri Jan 31 14:36:32 2025 -0500
Check for SPDX headers using pre-commit
Signed-off-by: Russell Bryant <rbryant@redhat.com >
---------
Signed-off-by: Russell Bryant <rbryant@redhat.com >
2025-02-02 11:58:18 -08:00
Lucas Wilkinson
cabaf4eff3
[Attention] MLA decode optimizations ( #12528 )
...
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com >
Signed-off-by: simon-mo <xmo@berkeley.edu >
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
Co-authored-by: simon-mo <simon.mo@hey.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com >
Co-authored-by: Tyler Michael Smith <tysmith@redhat.com >
Co-authored-by: Alexander Matveev <59768536+alexm-neuralmagic@users.noreply.github.com >
Co-authored-by: simon-mo <xmo@berkeley.edu >
2025-01-30 23:49:37 -08:00
Yanyi Liu
ff7424f491
[Frontend] Support override generation config in args ( #12409 )
...
Signed-off-by: liuyanyi <wolfsonliu@163.com >
2025-01-29 01:41:01 -08:00
Robert Shaw
e29d4358ef
[V1] Include Engine Version in Logs ( #12496 )
...
Signed-off-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com >
2025-01-28 08:27:41 +00:00
Nicolò Lucchesi
6116ca8cd7
[Feature] [Spec decode]: Enable MLPSpeculator/Medusa and prompt_logprobs with ChunkedPrefill ( #10132 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
Signed-off-by: wallashss <wallashss@ibm.com >
Co-authored-by: wallashss <wallashss@ibm.com >
2025-01-27 13:38:35 -08:00
Matthew Hendrey
9ddc35220b
[Frontend] generation_config.json for maximum tokens( #12242 )
...
Signed-off-by: Matthew Hendrey <matthew.hendrey@gmail.com >
Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com >
Signed-off-by: youkaichao <youkaichao@gmail.com >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Signed-off-by: Yuan Tang <terrytangyuan@gmail.com >
Signed-off-by: Isotr0py <2037008807@qq.com >
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Signed-off-by: Chen Zhang <zhangch99@outlook.com >
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com >
Co-authored-by: shangmingc <caishangming@linux.alibaba.com >
Co-authored-by: youkaichao <youkaichao@gmail.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Yuan Tang <terrytangyuan@gmail.com >
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
Co-authored-by: Chen Zhang <zhangch99@outlook.com >
Co-authored-by: wangxiyuan <wangxiyuan1007@gmail.com >
2025-01-26 19:59:25 +08:00
Cyrus Leung
df5dafaa5b
[Misc] Remove deprecated code ( #12383 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-01-24 14:45:20 -05:00
Woosuk Kwon
0e74d797ce
[V1] Increase default batch size for H100/H200 ( #12369 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-01-24 03:19:55 +00:00
Gregory Shtrasberg
e97f802b2d
[FP8][Kernel] Dynamic kv cache scaling factors computation ( #11906 )
...
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com >
Co-authored-by: Micah Williamson <micah.williamson@amd.com >
2025-01-23 18:04:03 +00:00
youkaichao
6e650f56a1
[torch.compile] decouple compile sizes and cudagraph sizes ( #12243 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com >
2025-01-24 02:01:30 +08:00
Cody Yu
7206ce4ce1
[Core] Support reset_prefix_cache ( #12284 )
2025-01-22 18:52:27 +00:00
Konrad Zawora
96f6a7596f
[Bugfix] Fix HPU multiprocessing executor ( #12167 )
...
Signed-off-by: Konrad Zawora <kzawora@habana.ai >
2025-01-23 02:07:07 +08:00
youkaichao
68ad4e3a8d
[Core] Support fully transparent sleep mode ( #11743 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com >
2025-01-22 14:39:32 +08:00
Aleksandr Malyshev
69196a9bc7
[BUGFIX] When skip_tokenize_init and multistep are set, execution crashes ( #12277 )
...
Signed-off-by: maleksan85 <maleksan@amd.com >
Co-authored-by: maleksan85 <maleksan@amd.com >
2025-01-21 23:30:46 +00:00
Adrian Cole
347eeebe3b
[Misc] Remove experimental dep from tracing.py ( #12007 )
...
Signed-off-by: Adrian Cole <adrian.cole@elastic.co >
2025-01-21 11:51:55 -08:00
Jannis Schönleber
9705b90bcf
[Bugfix] fix race condition that leads to wrong order of token returned ( #10802 )
...
Signed-off-by: Jannis Schönleber <joennlae@gmail.com >
2025-01-21 09:47:04 -08:00
Cyrus Leung
59a0192fb9
[Core] Interface for accessing model from VllmRunner ( #10353 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-01-20 15:00:59 +08:00
Yuan Tang
d2643128f7
[DOC] Add missing docstring in LLMEngine.add_request() ( #12195 )
...
Signed-off-by: Yuan Tang <terrytangyuan@gmail.com >
2025-01-20 14:59:00 +08:00
Yuan Tang
c5c06209ec
[DOC] Fix typo in docstring and assert message ( #12194 )
...
Signed-off-by: Yuan Tang <terrytangyuan@gmail.com >
2025-01-20 14:58:29 +08:00
youkaichao
87a0c076af
[core] allow callable in collective_rpc ( #12151 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com >
2025-01-17 20:47:01 +08:00
Jee Jee Li
07934cc237
[Misc][LoRA] Improve the readability of LoRA error messages ( #12102 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-01-17 19:32:28 +08:00
youkaichao
bf53e0c70b
Support torchrun and SPMD-style offline inference ( #12071 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com >
2025-01-16 19:58:53 +08:00
maang-h
57e729e874
[Doc]: Update OpenAI-Compatible Server documents ( #12082 )
2025-01-15 16:07:45 +00:00
youkaichao
ad34c0df0f
[core] platform agnostic executor via collective_rpc ( #11256 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com >
2025-01-15 13:45:21 +08:00
maang-h
87054a57ab
[Doc]: Update the Json Example of the Engine Arguments document ( #12045 )
2025-01-14 17:03:04 +00:00
Joe Runde
ac2f3f7fee
[Bugfix] Validate lora adapters to avoid crashing server ( #11727 )
...
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com >
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com >
2025-01-10 15:56:36 +08:00
Jie Fu (傅杰)
a4e2b26856
[Bugfix] Significant performance drop on CPUs with --num-scheduler-steps > 1 ( #11794 )
2025-01-07 16:15:50 -08:00
Cyrus Leung
ee77fdb5de
[Doc][2/N] Reorganize Models and Usage sections ( #11755 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-01-06 21:40:31 +08:00
youkaichao
b12e87f942
[platforms] enable platform plugins ( #11602 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com >
2024-12-30 20:24:45 +08:00
Rajveer Bachkaniwala
b5cbe8eeb3
[Bugfix] Last token measurement fix ( #11376 )
...
Signed-off-by: rajveerb <46040700+rajveerb@users.noreply.github.com >
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com >
2024-12-28 11:34:46 +08:00
Rafael Vasquez
32aa2059ad
[Docs] Convert rST to MyST (Markdown) ( #11145 )
...
Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com >
2024-12-23 22:35:38 +00:00
yansh97
94d545a1a1
[Doc] Fix typo in the help message of '--guided-decoding-backend' ( #11440 )
2024-12-23 20:20:44 +00:00
Ricky Xu
584f0ae40d
[V1] Make AsyncLLMEngine v1-v0 opaque ( #11383 )
...
Signed-off-by: Ricky Xu <xuchen727@hotmail.com >
2024-12-21 15:14:08 +08:00
omer-dayan
995f56236b
[Core] Loading model from S3 using RunAI Model Streamer as optional loader ( #10192 )
...
Signed-off-by: OmerD <omer@run.ai >
2024-12-20 16:46:24 +00:00
Yanyi Liu
5aef49806d
[Feature] Add load generation config from model ( #11164 )
...
Signed-off-by: liuyanyi <wolfsonliu@163.com >
Signed-off-by: Yanyi Liu <wolfsonliu@163.com >
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2024-12-19 10:50:38 +00:00
Alexander Matveev
fdea8ec167
[V1] VLM - enable processor cache by default ( #11305 )
...
Signed-off-by: Alexander Matveev <alexm@neuralmagic.com >
2024-12-18 18:54:46 -05:00
Konrad Zawora
866fa4550d
[Bugfix] Restore support for larger block sizes ( #11259 )
...
Signed-off-by: Konrad Zawora <kzawora@habana.ai >
2024-12-17 16:39:07 -08:00
Cody Yu
bf8717ebae
[V1] Prefix caching for vision language models ( #11187 )
...
Signed-off-by: Cody Yu <hao.yu.cody@gmail.com >
2024-12-17 16:37:59 -08:00
Joe Runde
2d1b9baa8f
[Bugfix] Fix request cancellation without polling ( #11190 )
Create Release / Create Release (push) Has been cancelled
Create Release / Build Wheel (11.8, ubuntu-20.04, 3.10, 2.4.0) (push) Has been cancelled
Create Release / Build Wheel (11.8, ubuntu-20.04, 3.11, 2.4.0) (push) Has been cancelled
Create Release / Build Wheel (11.8, ubuntu-20.04, 3.12, 2.4.0) (push) Has been cancelled
Create Release / Build Wheel (11.8, ubuntu-20.04, 3.9, 2.4.0) (push) Has been cancelled
Create Release / Build Wheel (12.1, ubuntu-20.04, 3.10, 2.4.0) (push) Has been cancelled
Create Release / Build Wheel (12.1, ubuntu-20.04, 3.11, 2.4.0) (push) Has been cancelled
Create Release / Build Wheel (12.1, ubuntu-20.04, 3.12, 2.4.0) (push) Has been cancelled
Create Release / Build Wheel (12.1, ubuntu-20.04, 3.9, 2.4.0) (push) Has been cancelled
2024-12-17 12:26:32 -08:00
wangxiyuan
e88db68cf5
[Platform] platform agnostic for EngineArgs initialization ( #11225 )
...
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com >
2024-12-16 22:11:06 -08:00
youkaichao
551603feff
[core] overhaul memory profiling and fix backward compatibility ( #10511 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com >
2024-12-16 13:32:25 -08:00
chenqianfzh
69ba344de8
[Bugfix] Fix block size validation ( #10938 )
2024-12-15 16:38:40 -08:00
Brad Hilton
9c3dadd1c9
[Frontend] Add logits_processors as an extra completion argument ( #11150 )
...
Signed-off-by: Brad Hilton <brad.hilton.nw@gmail.com >
2024-12-14 16:46:42 +00:00