Nick Hill
|
55aa7af994
|
[V1] DP scale-out (2/N): Decouple engine process management and comms (#15977)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-05-13 10:48:21 -07:00 |
|
Cyrus Leung
|
b922c2ebd2
|
[Bugfix] Fix entrypoints metrics tests (#18063)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-05-13 06:42:43 -07:00 |
|
Aaron Pham
|
cb528d0585
|
[Fix] check to make sure processor has chat templates (#18047)
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
|
2025-05-13 03:04:10 -07:00 |
|
Cyrus Leung
|
61e0a506a3
|
[Bugfix] Avoid repeatedly creating dummy data during engine startup (#17935)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-05-12 22:40:19 -07:00 |
|
Robert Shaw
|
d19110204c
|
[P/D] NIXL Integration (#17751)
Signed-off-by: ApostaC <yihua98@uchicago.edu>
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
Signed-off-by: Brent Salisbury <bsalisbu@redhat.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
Co-authored-by: ApostaC <yihua98@uchicago.edu>
Co-authored-by: Robert Shaw <rshaw@neuralmagic.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
Co-authored-by: Tyler Michael Smith <tysmith@redhat.com>
Co-authored-by: Brent Salisbury <bsalisbu@redhat.com>
|
2025-05-12 09:46:16 -07:00 |
|
Maximilien de Bayser
|
05a4324f8e
|
Initialize the delta tool call fields explicitly (#17340)
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Co-authored-by: igmainc <igmainc@icloud.com>
|
2025-05-12 13:28:58 +00:00 |
|
Xu Wenqing
|
3a5ea75129
|
[Feature] Support DeepSeekV3 Function Call (#17784)
Signed-off-by: 许文卿 <xwq391974@alibaba-inc.com>
Signed-off-by: Xu Wenqing <xuwq1993@qq.com>
|
2025-05-12 00:45:21 -07:00 |
|
Li Wang
|
19a3c78d1f
|
[Bugfix] Fix pydantic.errors.PydanticUserError (#17962)
Signed-off-by: wangli <wangli858794774@gmail.com>
|
2025-05-12 12:58:23 +08:00 |
|
Reid
|
ada50aa295
|
[bugfix] fix the wrong parser (#17958)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
|
2025-05-12 04:58:02 +00:00 |
|
Isotr0py
|
021c16c7ca
|
[Model] Broadcast Ovis2 implementation to fit Ovis1.6 (#17861)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-05-11 17:56:30 -07:00 |
|
Frieda Huang
|
9cea90eab4
|
[Frontend] Add /classify endpoint (#17032)
Signed-off-by: Frieda (Jingying) Huang <jingyingfhuang@gmail.com>
|
2025-05-11 07:57:07 +00:00 |
|
Ximo Guanter
|
fc4441a4ee
|
Add missing content type headers to /ping and /health (#17036) (#17786)
Signed-off-by: Ximo Guanter <ximo.guanter@gmail.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-05-10 07:13:32 +01:00 |
|
Harry Mellor
|
4b2ed7926a
|
Improve configs - the rest! (#17562)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-05-09 15:18:44 -07:00 |
|
Harry Mellor
|
7d4aedae7c
|
Handle error when str passed to /v1/audio/transcriptions (#17909)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-05-09 19:23:59 +00:00 |
|
Harry Mellor
|
c6798baa9c
|
Change top_k to be disabled with 0 (still accept -1 for now) (#17773)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-05-09 10:01:49 +00:00 |
|
Cyrus Leung
|
96722aa81d
|
[Frontend] Chat template fallbacks for multimodal models (#17805)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-05-07 23:05:54 -07:00 |
|
Harry Mellor
|
998eea4a0e
|
Only log non-default CLI args for online serving (#17803)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-05-07 22:33:29 -07:00 |
|
Chauncey
|
5394ad7387
|
[Bugfix] fix KeyError on top logprobs are special tokens (#17637)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2025-05-04 19:22:35 -07:00 |
|
Harry Mellor
|
d6484ef3c3
|
Add full API docs and improve the UX of navigating them (#17485)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-05-03 19:42:43 -07:00 |
|
Cyrus Leung
|
cb234955df
|
[Misc] Clean up input processing (#17582)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-05-02 08:11:53 -07:00 |
|
Chauncey
|
98060b001d
|
[Feature][Frontend]: Deprecate --enable-reasoning (#17452)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2025-05-01 06:46:16 -07:00 |
|
Cyrus Leung
|
1903c0b8a3
|
[Frontend] Show progress bar for adding requests (#17525)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-05-01 05:15:32 -07:00 |
|
Marko Rosenmueller
|
77073c77bc
|
[Core] Prevent side-channel attacks via cache salting (#17045)
Signed-off-by: Marko Rosenmueller <5467316+dr75@users.noreply.github.com>
|
2025-04-30 20:27:21 +08:00 |
|
Marco
|
54072f315f
|
[MODEL ADDITION] Ovis2 Model Addition (#15826)
Signed-off-by: Marco <121761685+mlinmg@users.noreply.github.com>
Signed-off-by: Isotr0py <2037008807@qq.com>
Signed-off-by: isotr0py <2037008807@qq.com>
Co-authored-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-04-30 07:33:29 +00:00 |
|
Chauncey
|
be633fba0f
|
[Bugfix] Fix AttributeError: 'State' object has no attribute 'engine_client' (#17434)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2025-04-30 00:11:04 -07:00 |
|
Harry Mellor
|
13698db634
|
Improve configs - ModelConfig (#17130)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-04-30 10:38:22 +08:00 |
|
Gabriel Marinho
|
1c2bc7ead0
|
Truncation control for embedding models (#14776)
Signed-off-by: Gabriel Marinho <gmarinho@ibm.com>
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Co-authored-by: Max de Bayser <mbayser@br.ibm.com>
|
2025-04-30 09:24:57 +08:00 |
|
Cyrus Leung
|
88ad9ec6b2
|
[Frontend] Support chat_template_kwargs in LLM.chat (#17356)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-04-29 22:03:35 +08:00 |
|
Harry Mellor
|
f94886946e
|
Improve conversion from dataclass configs to argparse arguments (#17303)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-04-28 16:22:12 +00:00 |
|
Alex Brooks
|
fa93cd9f60
|
[Model] Add Granite Speech Support (#16246)
Signed-off-by: Alex-Brooks <Alex.brooks@ibm.com>
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
|
2025-04-28 10:05:00 +00:00 |
|
Russell Bryant
|
f8acd01ff7
|
[V1] Add structural_tag support using xgrammar (#17085)
|
2025-04-26 14:06:37 +00:00 |
|
Nick Hill
|
70116459c3
|
[BugFix][Frontend] Fix LLM.chat() tokenization (#16081)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-04-25 22:20:05 +00:00 |
|
Jasmond L
|
d5615af9ae
|
[Bugfix] Fix Mistral ChatCompletionRequest Body Exception (#16769)
Signed-off-by: Jasmond Loh <Jasmond.Loh@hotmail.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2025-04-25 07:26:30 -07:00 |
|
Cyrus Leung
|
19dcc02a72
|
[Bugfix] Fix mistral model tests (#17181)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-04-25 06:03:34 -07:00 |
|
Alex Brooks
|
7feae92c1f
|
[Doc] Move todo out of beam search docstring (#17183)
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
|
2025-04-25 04:44:58 -07:00 |
|
Maximilien de Bayser
|
05e1fbfc52
|
Add chat template for Llama 4 models (#16428)
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
|
2025-04-24 20:19:36 +00:00 |
|
Harry Mellor
|
0a05ed57e6
|
Simplify TokenizerGroup (#16790)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-04-24 04:43:56 -07:00 |
|
Chauncey
|
8c87a9ad46
|
[Bugfix] Fix AssertionError: skip_special_tokens=False is not supported for Mistral tokenizers (#16964)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2025-04-23 07:24:09 +00:00 |
|
Guillaume Calmettes
|
36fe78769f
|
[Bugfix] validate urls object for multimodal content parts (#16990)
Signed-off-by: Guillaume Calmettes <gcalmettes@scaleway.com>
|
2025-04-23 09:43:06 +08:00 |
|
Reid
|
f34410715f
|
[frontend] enhance tool_calls type check (#16882)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
|
2025-04-22 15:40:24 +00:00 |
|
Isotr0py
|
83f3c3bd91
|
[Model] Refactor Phi-4-multimodal to use merged processor and support V1 (#15477)
Signed-off-by: Isotr0py <2037008807@qq.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-04-19 02:26:11 -07:00 |
|
Nicolò Lucchesi
|
2ef0dc53b8
|
[Frontend] Add sampling params to v1/audio/transcriptions endpoint (#16591)
Signed-off-by: Jannis Schönleber <joennlae@gmail.com>
Signed-off-by: NickLucche <nlucches@redhat.com>
Co-authored-by: Jannis Schönleber <joennlae@gmail.com>
|
2025-04-19 07:03:54 +00:00 |
|
Yang Fan
|
2c1bd848a6
|
[Model][VLM] Add Qwen2.5-Omni model support (thinker only) (#15130)
Signed-off-by: fyabc <suyang.fy@alibaba-inc.com>
Signed-off-by: Roger Wang <ywang@roblox.com>
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
Co-authored-by: Xiong Wang <wangxiongts@163.com>
|
2025-04-18 23:14:36 -07:00 |
|
rongfu.leng
|
7bdfd29a35
|
[Misc] add collect_env to cli and docker image (#16759)
Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>
|
2025-04-17 22:13:35 -07:00 |
|
Mark McLoughlin
|
e4755f7fac
|
[V1][Metrics] Fix http metrics middleware (#15894)
|
2025-04-17 19:52:18 +00:00 |
|
Robin
|
6211b92273
|
[Bugfix]Fix index out of range error in api server log (#16787)
Signed-off-by: WangErXiao <863579016@qq.com>
|
2025-04-17 09:01:07 -07:00 |
|
Nick Hill
|
05fcd1b430
|
[V1][Perf] Faster incremental detokenization (#15137)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-04-17 07:45:24 -07:00 |
|
Robert Shaw
|
2b05b8ce69
|
[V1][Frontend] Improve Shutdown And Logs (#11737)
Signed-off-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>
Signed-off-by: Andrew Feldman <afeldman@neuralmagic.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
Co-authored-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Russell Bryant <rbryant@redhat.com>
Co-authored-by: Andrew Feldman <afeldman@neuralmagic.com>
Co-authored-by: afeldman-nm <156691304+afeldman-nm@users.noreply.github.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
|
2025-04-16 19:48:34 -07:00 |
|
Angky William
|
fdcb850f14
|
[Misc] Enable vLLM to Dynamically Load LoRA from a Remote Server (#10546)
Signed-off-by: Angky William <angkywilliam@Angkys-MacBook-Pro.local>
Co-authored-by: Angky William <angkywilliam@Angkys-MacBook-Pro.local>
|
2025-04-15 22:31:38 +00:00 |
|
Xihui Cang
|
1666e66443
|
Add "/server_info" endpoint in api_server to retrieve the vllm_config. (#16572)
Signed-off-by: Xihui Cang <xihuicang@gmail.com>
|
2025-04-15 11:50:38 +00:00 |
|