Guillaume Calmettes
|
36fe78769f
|
[Bugfix] validate urls object for multimodal content parts (#16990)
Signed-off-by: Guillaume Calmettes <gcalmettes@scaleway.com>
|
2025-04-23 09:43:06 +08:00 |
|
Nicolò Lucchesi
|
9d4ca19d50
|
[Misc] Benchmarks for audio models (#16505)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2025-04-19 02:24:14 -07:00 |
|
Nicolò Lucchesi
|
2ef0dc53b8
|
[Frontend] Add sampling params to v1/audio/transcriptions endpoint (#16591)
Signed-off-by: Jannis Schönleber <joennlae@gmail.com>
Signed-off-by: NickLucche <nlucches@redhat.com>
Co-authored-by: Jannis Schönleber <joennlae@gmail.com>
|
2025-04-19 07:03:54 +00:00 |
|
wang.yuqi
|
3d3ab3689f
|
[New Model]: Snowflake Arctic Embed (Family) (#16649)
|
2025-04-18 08:11:57 -07:00 |
|
Harry Mellor
|
686623c5e7
|
Fix nullable_kvs fallback (#16837)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-04-18 05:58:39 -07:00 |
|
Harry Mellor
|
e78587a64c
|
Improve-mm-and-pooler-and-decoding-configs (#16789)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-04-17 22:13:32 -07:00 |
|
Tarun Kumar
|
e37073efd7
|
Add property-based testing for vLLM endpoints using an API defined by an OpenAPI 3.1 schema (#16721)
Signed-off-by: Tarun Kumar <takumar@redhat.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
|
2025-04-17 21:08:27 -07:00 |
|
Angky William
|
fdcb850f14
|
[Misc] Enable vLLM to Dynamically Load LoRA from a Remote Server (#10546)
Signed-off-by: Angky William <angkywilliam@Angkys-MacBook-Pro.local>
Co-authored-by: Angky William <angkywilliam@Angkys-MacBook-Pro.local>
|
2025-04-15 22:31:38 +00:00 |
|
Ryan McConville
|
6c11ecf8d3
|
[Bugfix] Validate logit biases to prevent out of vocab ids crashing engine (#16529)
Signed-off-by: Ryan McConville <ryan@ryanmcconville.com>
|
2025-04-12 20:19:19 +00:00 |
|
Cyrus Leung
|
d9fc8cd9da
|
[V1] Enable multi-input by default (#15799)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-04-12 08:52:39 +00:00 |
|
wang.yuqi
|
fbf722c6e6
|
[Frontend] support matryoshka representation / support embedding API dimensions (#16331)
|
2025-04-11 23:23:10 -07:00 |
|
Russell Bryant
|
9665313c39
|
[V1] Set structured output backend to auto by default (#15724)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2025-04-10 17:53:26 +00:00 |
|
Michael Goin
|
87b4ac56c2
|
[CI][Bugfix] Fix bad tolerance for test_batch_base64_embedding (#16221)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-04-09 04:14:46 +00:00 |
|
Cyrus Leung
|
4ebc0b9640
|
[Bugfix] Proper input validation for multi-modal encoder-decoder models (#16156)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-04-08 09:45:21 -07:00 |
|
Matthias Matt
|
cefb9e5a28
|
[Frontend] Implement Tool Calling with tool_choice='required' (#13483)
Signed-off-by: Liangfu Chen <liangfc@amazon.com>
Signed-off-by: Matt, Matthias <matthias.matt@tuwien.ac.at>
Co-authored-by: Liangfu Chen <liangfc@amazon.com>
Co-authored-by: mgoin <michael@neuralmagic.com>
|
2025-04-02 07:45:45 -07:00 |
|
Mark McLoughlin
|
98d7367b61
|
[Metrics] Hide deprecated metrics (#15458)
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
|
2025-04-02 07:37:19 -07:00 |
|
Chauncey
|
594a8b9030
|
[Bugfix] Fix the issue where the model name is empty string, causing no response with the model name. (#15938)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2025-04-02 06:33:52 -07:00 |
|
Eric Tang
|
ddb94c2605
|
[core] Add tags parameter to wake_up() (#15500)
Signed-off-by: Eric <erictang000@gmail.com>
|
2025-04-02 01:59:27 -07:00 |
|
Varun Sundar Rabindranath
|
79455cf421
|
[Misc] Enable V1 LoRA by default (#15320)
Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
|
2025-04-01 16:53:56 +08:00 |
|
pansicheng
|
7fd8c0f85c
|
fix test_phi3v (#15321)
Signed-off-by: pansicheng <sicheng.pan.chn@gmail.com>
|
2025-03-30 02:01:34 -07:00 |
|
Varun Sundar Rabindranath
|
1286211f57
|
[Bugfix] LoRA V1: add and fix entrypoints tests (#15715)
Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
|
2025-03-28 21:10:41 -07:00 |
|
Lize Cai
|
a10314c6b3
|
[Misc] Fix test_sleep to use query parameters (#14373)
Signed-off-by: Lize Cai <lize.cai@sap.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
|
2025-03-28 18:00:14 +08:00 |
|
Ce Gao
|
32b14baf8a
|
[Refactor][Frontend] Keep all logic about reasoning into one class (#14428)
Signed-off-by: Ce Gao <cegao@tensorchord.ai>
|
2025-03-28 00:23:30 -07:00 |
|
Alex Brooks
|
1711b929b6
|
[Model] Add Reasoning Parser for Granite Models (#14202)
Signed-off-by: Alex-Brooks <Alex.brooks@ibm.com>
Co-authored-by: Joe Runde <joe@joerun.de>
|
2025-03-26 14:28:07 +00:00 |
|
Cyrus Leung
|
cbcdf2c609
|
[Bugfix] Fix chat template loading (#15143)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: Roger Wang <ywang@roblox.com>
Co-authored-by: chaunceyjiang <chaunceyjiang@gmail.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
|
2025-03-24 13:50:09 +00:00 |
|
Robin
|
d6cd59f122
|
[Frontend] Support tool calling and reasoning parser (#14511)
Signed-off-by: WangErXiao <863579016@qq.com>
|
2025-03-23 14:00:07 -07:00 |
|
Cyrus Leung
|
f690372b68
|
[Core] Update dtype detection and defaults (#14858)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-03-19 13:49:33 +08:00 |
|
Sibi
|
a73e183e36
|
[Misc] Replace os environ to monkeypatch in test suite (#14516)
Signed-off-by: sibi <85477603+t-sibiraj@users.noreply.github.com>
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Aaron Pham <contact@aarnphm.xyz>
|
2025-03-16 20:35:57 -07:00 |
|
Jun Duan
|
74bc397b0a
|
[Core] Expose API endpoint /is_sleeping (#14312)
Signed-off-by: Jun Duan <jun.duan.phd@outlook.com>
|
2025-03-15 06:28:14 -07:00 |
|
Robert Shaw
|
d4d93db2c5
|
[V1] V1 Enablement Oracle (#13726)
Signed-off-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>
Co-authored-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
Co-authored-by: Michael Goin <michael@neuralmagic.com>
|
2025-03-14 22:02:20 -07:00 |
|
daniel-salib
|
73deea2fdb
|
[Frontend] track server_load (#13950)
|
2025-03-14 09:53:17 -07:00 |
|
Cyrus Leung
|
33f227e16b
|
[CI/Build] Use a fixed seed to avoid flaky tests (#14480)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-03-08 11:30:09 +00:00 |
|
Harry Mellor
|
47512b3200
|
Default to generation_config from model (#12622)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-03-08 14:46:15 +08:00 |
|
மனோஜ்குமார் பழனிச்சாமி
|
cc10281498
|
[Misc] Set default value of seed to None (#14274)
Signed-off-by: மனோஜ்குமார் பழனிச்சாமி <smartmanoj42857@gmail.com>
|
2025-03-07 10:40:01 +00:00 |
|
Nicolò Lucchesi
|
fa82b93853
|
[Frontend][Docs] Transcription API streaming (#13301)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2025-03-06 10:39:35 +00:00 |
|
Benjamin Chislett
|
32985bed7c
|
[Frontend] Allow return_tokens_as_token_ids to be passed as a request param (#14066)
Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai>
|
2025-03-05 06:30:40 +00:00 |
|
Mark McLoughlin
|
ae122b1cbd
|
[WIP][[V1][Metrics] Implement max_num_generation_tokens, request_params_n, and request_params_max_tokens metrics (#14055)
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
|
2025-03-03 19:04:45 +00:00 |
|
Harry Mellor
|
cf069aa8aa
|
Update deprecated Python 3.8 typing (#13971)
|
2025-03-02 17:34:51 -08:00 |
|
Harry Mellor
|
4be4b26cb7
|
Fix entrypoint tests for embedding models (#14052)
|
2025-02-28 08:56:44 -08:00 |
|
Harry Mellor
|
76c89fcadd
|
Use smaller embedding model when not testing model specifically (#13891)
|
2025-02-28 00:50:43 -08:00 |
|
Mark McLoughlin
|
cd711c48b2
|
[V1][Metrics] Handle preemptions (#13169)
|
2025-02-26 20:04:59 -08:00 |
|
Cyrus Leung
|
934bb99c71
|
[Bugfix] Update expected token counts for Ultravox tests (#13895)
|
2025-02-26 04:56:50 -08:00 |
|
Jee Jee Li
|
5157338ed9
|
[Misc] Improve LoRA spelling (#13831)
|
2025-02-25 23:43:01 -08:00 |
|
Mark McLoughlin
|
1cd981da4f
|
[V1][Metrics] Support vllm:cache_config_info (#13299)
|
2025-02-22 00:20:00 -08:00 |
|
Keyun Tong
|
0ffdf8ce0c
|
[HTTP Server] Make model param optional in request (#13568)
|
2025-02-21 21:55:50 -08:00 |
|
Gabriel Marinho
|
1c3c975766
|
[FEATURE] Enables /score endpoint for embedding models (#12846)
|
2025-02-20 22:09:47 -08:00 |
|
youkaichao
|
ba81163997
|
[core] add sleep and wake up endpoint and v1 support (#12987)
Signed-off-by: youkaichao <youkaichao@gmail.com>
Signed-off-by: cennn <2523403608@qq.com>
Co-authored-by: cennn <2523403608@qq.com>
|
2025-02-20 12:41:17 +08:00 |
|
Kevin H. Luu
|
d5d214ac7f
|
[1/n][CI] Load models in CI from S3 instead of HF (#13205)
Signed-off-by: <>
Co-authored-by: EC2 Default User <ec2-user@ip-172-31-20-117.us-west-2.compute.internal>
|
2025-02-19 07:34:59 +00:00 |
|
Mark McLoughlin
|
2ad1bc7afe
|
[V1][Metrics] Add iteration_tokens_total histogram from V0 (#13288)
|
2025-02-15 03:56:19 -08:00 |
|
Alexander Matveev
|
45f90bcbba
|
[WIP] TPU V1 Support Refactored (#13049)
|
2025-02-14 00:21:53 -08:00 |
|