Mike Depinet
|
f67ce05d0b
|
[Frontend] Pythonic tool parser (#9859)
Signed-off-by: Mike Depinet <mike@fixie.ai>
|
2024-11-14 04:14:34 +00:00 |
|
Russell Bryant
|
e0853b6508
|
[Misc] format.sh: Simplify tool_version_check (#10305)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2024-11-14 11:12:35 +08:00 |
|
youkaichao
|
504ac53d18
|
[misc] error early for old-style class (#10304)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-11-13 18:55:39 -08:00 |
|
Isotr0py
|
15bb8330aa
|
[Bugfix] Fix tensor parallel for qwen2 classification model (#10297)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2024-11-14 10:54:59 +08:00 |
|
HoangCongDuc
|
ac49b59d8b
|
[Bugfix] bitsandbytes models fail to run pipeline parallel (#10200)
Signed-off-by: Hoang Cong Duc <hoangcongducltt@gmail.com>
|
2024-11-13 09:56:39 -07:00 |
|
Cyrus Leung
|
0b8bb86bf1
|
[1/N] Initial prototype for multi-modal processor (#10044)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-11-13 12:39:03 +00:00 |
|
Roger Wang
|
bb7991aa29
|
[V1] Add missing tokenizer options for Detokenizer (#10288)
Signed-off-by: Roger Wang <ywang@roblox.com>
|
2024-11-13 11:02:56 +00:00 |
|
B-201
|
d909acf9fe
|
[Model][LoRA]LoRA support added for idefics3 (#10281)
Signed-off-by: B-201 <Joy25810@foxmail.com>
|
2024-11-13 17:25:59 +08:00 |
|
Pavani Majety
|
b6dde33019
|
[Core] Flashinfer - Remove advance step size restriction (#10282)
|
2024-11-13 16:29:32 +08:00 |
|
Austin Veselka
|
1b886aa104
|
[Model] Adding Support for Qwen2VL as an Embedding Model. Using MrLight/dse-qwen2-2b-mrl-v1 (#9944)
Signed-off-by: FurtherAI <austin.veselka@lighton.ai>
Co-authored-by: FurtherAI <austin.veselka@lighton.ai>
|
2024-11-13 08:28:13 +00:00 |
|
电脑星人
|
3945c82346
|
[Model] Add support for Qwen2-VL video embeddings input & multiple image embeddings input with varied resolutions (#10221)
Signed-off-by: imkero <kerorek@outlook.com>
|
2024-11-13 07:07:22 +00:00 |
|
Xin Yang
|
032fcf16ae
|
[Doc] Fix typo in arg_utils.py (#10264)
Signed-off-by: Xin Yang <xyang19@gmail.com>
|
2024-11-12 21:54:52 -08:00 |
|
Dipika Sikka
|
56a955e774
|
Bump to compressed-tensors v0.8.0 (#10279)
Signed-off-by: Dipika <dipikasikka1@gmail.com>
|
2024-11-12 21:54:10 -08:00 |
|
Woosuk Kwon
|
bbd3e86926
|
[V1] Support VLMs with fine-grained scheduling (#9871)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Co-authored-by: Roger Wang <ywang@roblox.com>
|
2024-11-13 04:53:13 +00:00 |
|
youkaichao
|
0d4ea3fb5c
|
[core][distributed] use tcp store directly (#10275)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-11-12 17:36:08 -08:00 |
|
Woosuk Kwon
|
112fa0bbe5
|
[V1] Fix CI tests on V1 engine (#10272)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2024-11-12 16:17:20 -08:00 |
|
youkaichao
|
377b74fe87
|
Revert "[ci][build] limit cmake version" (#10271)
|
2024-11-12 15:06:48 -08:00 |
|
youkaichao
|
18081451f9
|
[doc] improve debugging doc (#10270)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-11-12 14:43:52 -08:00 |
|
youkaichao
|
96ae0eaeb2
|
[doc] fix location of runllm widget (#10266)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-11-12 14:34:39 -08:00 |
|
Woosuk Kwon
|
1f55e05713
|
[V1] Enable Inductor when using piecewise CUDA graphs (#10268)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2024-11-12 13:39:56 -08:00 |
|
Umesh
|
8a06428c70
|
[LoRA] Adds support for bias in LoRA (#5733)
Signed-off-by: Umesh Deshpande <udeshpa@us.ibm.com>
Co-authored-by: Umesh Deshpande <udeshpa@us.ibm.com>
|
2024-11-12 11:08:40 -08:00 |
|
sroy745
|
b41fb9d3b1
|
[Encoder Decoder] Update Mllama to run with both FlashAttention and XFormers (#9982)
Signed-off-by: Sourashis Roy <sroy@roblox.com>
|
2024-11-12 10:53:57 -08:00 |
|
Woosuk Kwon
|
7c65527918
|
[V1] Use pickle for serializing EngineCoreRequest & Add multimodal inputs to EngineCoreRequest (#10245)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2024-11-12 08:57:14 -08:00 |
|
zifeitong
|
47db6ec831
|
[Frontend] Add per-request number of cached token stats (#10174)
|
2024-11-12 16:42:28 +00:00 |
|
Jie Fu (傅杰)
|
176fcb1c71
|
[Bugfix] Fix QwenModel argument (#10262)
Signed-off-by: Jie Fu <jiefu@tencent.com>
|
2024-11-12 16:36:51 +00:00 |
|
Jee Jee Li
|
a838ba7254
|
[Misc]Fix Idefics3Model argument (#10255)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2024-11-12 13:07:11 +00:00 |
|
Guillaume Calmettes
|
36c513a076
|
[BugFix] Do not raise a ValueError when tool_choice is set to the supported none option and tools are not defined. (#10000)
Signed-off-by: Guillaume Calmettes <gcalmettes@scaleway.com>
|
2024-11-12 11:13:46 +00:00 |
|
Yuan
|
d201d41973
|
[CI][CPU]refactor CPU tests to allow to bind with different cores (#10222)
Signed-off-by: Yuan Zhou <yuan.zhou@intel.com>
|
2024-11-12 10:07:32 +00:00 |
|
youkaichao
|
3a28f18b0b
|
[doc] explain the class hierarchy in vLLM (#10240)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-11-11 22:56:44 -08:00 |
|
Aleksandr Malyshev
|
812c981fa0
|
Splitting attention kernel file (#10091)
Signed-off-by: maleksan85 <maleksan@amd.com>
Co-authored-by: Aleksandr Malyshev <maleksan@amd.com>
|
2024-11-11 22:55:07 -08:00 |
|
Jee Jee Li
|
7f5edb5900
|
[Misc][LoRA] Replace hardcoded cuda device with configurable argument (#10223)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2024-11-12 11:10:15 +08:00 |
|
youkaichao
|
eea55cca5b
|
[1/N] torch.compile user interface design (#10237)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-11-11 18:01:06 -08:00 |
|
Russell Bryant
|
9cdba9669c
|
[Doc] Update help text for --distributed-executor-backend (#10231)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2024-11-12 09:55:09 +08:00 |
|
youkaichao
|
d1c6799b88
|
[doc] update debugging guide (#10236)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-11-11 15:21:12 -08:00 |
|
Robert Shaw
|
6ace6fba2c
|
[V1] AsyncLLM Implementation (#9826)
Signed-off-by: Nick Hill <nickhill@us.ibm.com>
Signed-off-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
|
2024-11-11 23:05:38 +00:00 |
|
Nikolai Shcheglov
|
08f93e7439
|
Make shutil rename in python_only_dev (#10233)
Signed-off-by: shcheglovnd <shcheglovnd@avride.ai>
|
2024-11-11 14:29:19 -08:00 |
|
Woosuk Kwon
|
9d5b4e4dea
|
[V1] Enable custom ops with piecewise CUDA graphs (#10228)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2024-11-11 11:58:07 -08:00 |
|
youkaichao
|
8a7fe47d32
|
[misc][distributed] auto port selection and disable tests (#10226)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-11-11 11:54:59 -08:00 |
|
Yuan Tang
|
4800339c62
|
Add docs on serving with Llama Stack (#10183)
Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
Co-authored-by: Russell Bryant <rbryant@redhat.com>
|
2024-11-11 11:28:55 -08:00 |
|
Woosuk Kwon
|
fe15729a2b
|
[V1] Use custom ops for piecewise CUDA graphs (#10227)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2024-11-11 11:26:48 -08:00 |
|
youkaichao
|
330e82d34a
|
[v1][torch.compile] support managing cudagraph buffer (#10203)
Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2024-11-11 11:10:27 -08:00 |
|
Woosuk Kwon
|
d7a4f2207b
|
[V1] Do not use inductor for piecewise CUDA graphs (#10225)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2024-11-11 11:05:57 -08:00 |
|
Woosuk Kwon
|
f9dadfbee3
|
[V1] Fix detokenizer ports (#10224)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2024-11-11 10:42:07 -08:00 |
|
dependabot[bot]
|
25144ceed0
|
Bump actions/setup-python from 5.2.0 to 5.3.0 (#10209)
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
|
2024-11-11 17:24:10 +00:00 |
|
youkaichao
|
e6de9784d2
|
[core][distributed] add stateless process group (#10216)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-11-11 09:02:14 -08:00 |
|
Yangcheng Li
|
36fc439de0
|
[Doc] fix doc string typo in block_manager swap_out function (#10212)
|
2024-11-11 08:53:07 -08:00 |
|
harrywu
|
874f551b36
|
[Metrics] add more metrics (#4464)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: Robert Shaw <rshaw@neuralmagic.com>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-11-12 00:17:38 +08:00 |
|
Isotr0py
|
2cebda42bb
|
[Bugfix][Hardware][CPU] Fix broken encoder-decoder CPU runner (#10218)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2024-11-11 12:37:58 +00:00 |
|
Roger Wang
|
5fb1f935b0
|
[V1] Allow tokenizer_mode and trust_remote_code for Detokenizer (#10211)
Signed-off-by: Roger Wang <ywang@roblox.com>
|
2024-11-11 18:01:18 +08:00 |
|
Jee Jee Li
|
36e4acd02a
|
[LoRA][Kernel] Remove the unused libentry module (#10214)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2024-11-11 09:43:23 +00:00 |
|