Juan Pérez de Algaba
b111f8a61f
fix(security): Add VLLM_MAX_N_SEQUENCES environment variable and enforce limit ( #37952 )
...
Signed-off-by: jperezde <jperezde@redhat.com >
Signed-off-by: Russell Bryant <rbryant@redhat.com >
Co-authored-by: Russell Bryant <rbryant@redhat.com >
2026-03-27 09:02:10 -04:00
Bvicii
999dfc1622
[Bugfix] Offload blocking tokenizer ops to shared thread pool to unblock event loop ( #34789 )
...
Signed-off-by: Bvicii <yizhanhuang2002@gmail.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2026-03-26 22:17:00 -07:00
Cyrus Leung
2e225f7bd2
[Renderer] Consolidate factory methods ( #38218 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-03-26 12:19:22 +00:00
wang.yuqi
dcdc145893
[CI] Reorganize scoring tests ( #38207 )
...
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io >
2026-03-26 12:07:01 +00:00
Matej Rojec
2908094567
Add /v1/chat/completions/batch endpoint for batched chat completions ( #38011 )
...
Signed-off-by: Matej Rojec <64556640+MatejRojec@users.noreply.github.com >
2026-03-26 12:13:33 +08:00
Harry Mellor
3c3c084240
Various Transformers v5 fixes ( #38127 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-03-26 00:10:08 +00:00
Ekagra Ranjan
7b54f60db0
[Cohere] Enable Cohere-Transcribe ( #38120 )
...
Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com >
2026-03-25 16:13:51 -07:00
Nick Hill
72cad44d3c
[Frontend] Move APIServerProcessManager target server fn ( #38115 )
...
Signed-off-by: Nick Hill <nickhill123@gmail.com >
2026-03-25 18:14:41 +00:00
Cyrus Leung
ba2f0acc2d
[Misc] Reorganize inputs ( #35182 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-03-25 10:22:54 -07:00
Harry Mellor
d215d1efca
[Mypy] Better fixes for the mypy issues in vllm/config ( #37902 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-03-25 06:14:43 -07:00
Harry Mellor
ba2910f73a
Fix offline mode test for Transformers v5 ( #38095 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-03-25 11:39:48 +00:00
Andreas Karatzas
f262a62aa1
[ROCm][CI] Fix flaky Cohere/OpenAI embedding parity test ( #37616 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-03-25 10:55:51 +00:00
Andreas Karatzas
9ac2fcafbb
[CI] Fix realtime WebSocket timeout deadlock and unhandled model validation errors ( #37483 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-03-25 11:24:33 +01:00
Andreas Karatzas
04cec4f927
[ROCm][CI] Increase OpenAPI schema test timeouts ( #38088 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-03-25 18:06:58 +08:00
Dhruv Singal
4df5fa7439
[Bugfix] Force continuous usage stats when CLI override is enabled ( #37923 )
...
Signed-off-by: Your Name <you@example.com >
Co-authored-by: Your Name <you@example.com >
Co-authored-by: OpenCode <noreply@openai.com >
2026-03-24 10:29:50 -07:00
wang.yuqi
1b6cb920e6
[Deprecate] Deprecate pooling multi task support. ( #37956 )
...
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io >
Signed-off-by: wang.yuqi <noooop@126.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
2026-03-24 14:07:47 +00:00
Ben Browning
3bbe2e1e6e
[Test] Consolidate tool parser unit tests to tests/tool_parsers ( #37834 )
...
Signed-off-by: Ben Browning <bbrownin@redhat.com >
2026-03-23 04:24:25 +00:00
Andreas Karatzas
cd1242d82a
[ROCm][CI] Stabilize ROCm speech-to-text translation test with lower min acc threshold ( #37723 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-03-22 17:32:08 +08:00
Isotr0py
c7f98b4d0a
[Frontend] Remove librosa from audio dependency ( #37058 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2026-03-21 11:36:15 +08:00
Harry Mellor
6ade4bc5a5
Fix various config related issues for Transformers v5 ( #37681 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-03-20 16:30:12 +00:00
Flora Feng
b4c1aef21c
[Refactor] Relocate tests from tests/v1/entrypoints/ to tests/entrypoints/ ( #37500 )
...
Signed-off-by: sfeng33 <4florafeng@gmail.com >
2026-03-20 02:50:34 -07:00
Flora Feng
6050b93bed
[Refactor] Move serve entrypoint tests under tests/entrypoints/serve/ ( #37595 )
...
Signed-off-by: sfeng33 <4florafeng@gmail.com >
2026-03-20 02:10:47 -07:00
Flora Feng
e2d1c8b5e8
[Refactor] Relocate entrypoint tests to match serving code structure ( #37593 )
...
Signed-off-by: sfeng33 <4florafeng@gmail.com >
2026-03-20 05:31:23 +00:00
Flora Feng
9040151fe1
[V0 Deprecation] Deprecate --disable-frontend-multiprocessing ( #37612 )
...
Signed-off-by: sfeng33 <4florafeng@gmail.com >
2026-03-20 11:31:43 +08:00
Ifta khairul Alam Adil
104605cbf2
Remove deprecated reasoning_content message field(part-2) ( #37480 )
...
Signed-off-by: JartX <sagformas@epdcenter.es >
Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com >
Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
Signed-off-by: Philip Ottesen <phiott256@gmail.com >
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai >
Signed-off-by: Michael Goin <mgoin64@gmail.com >
Signed-off-by: Giancarlo Delfin <gdelfin@inferact.ai >
Signed-off-by: Andy Lo <andy@mistral.ai >
Signed-off-by: Thillai Chithambaram <thillaichithambaram.a@gmail.com >
Signed-off-by: sihao.li <sihao.li@intel.com >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: JartX <sagformas@epdcenter.es >
Co-authored-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com >
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
Co-authored-by: Philip Ottesen <phiott256@gmail.com >
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
Co-authored-by: Giancarlo Delfin <32987265+TheEpicDolphin@users.noreply.github.com >
Co-authored-by: Andy Lo <andy@mistral.ai >
Co-authored-by: Thillai Chithambaram <79466435+thillai-c@users.noreply.github.com >
Co-authored-by: sihao_li <165983188+1643661061leo@users.noreply.github.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-03-19 15:20:08 +00:00
Flora Feng
b21d384304
[Refactor] Relocate endpoint tests to mirror serving code directory structure ( #37504 )
...
Signed-off-by: sfeng33 <4florafeng@gmail.com >
2026-03-19 07:19:36 +00:00
Chauncey
b322b197f1
[Build] Bump python openai version ( #32316 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2026-03-18 18:20:10 +08:00
Andreas Karatzas
8b6325758c
[ROCm][CI] Add ROCM_EXTRA_ARGS to audio_in_video test server fixture ( #37349 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-03-18 04:55:40 +00:00
Ekagra Ranjan
b5ca9c3557
[Models] Cohere ASR ( #35809 )
...
Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com >
2026-03-17 21:04:17 +00:00
Isotr0py
a836524d20
[Chore] Replace all base64 usages with faster pybase64 package ( #37290 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2026-03-17 14:44:19 +00:00
Bhoomit
3717a4dd47
[Misc][LoRA] Add --lora-target-modules to restrict LoRA to specific modules ( #34984 )
...
Signed-off-by: Bhoomit Vasani <bhoomit.2010@gmail.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-03-17 14:36:41 +00:00
Sage
59192dfd39
[Frontend] Complete OpenAI render delegation ( #37287 )
...
Signed-off-by: Sage Ahrac <sagiahrak@gmail.com >
2026-03-17 13:53:55 +00:00
Viacheslav
293f036e6d
Add gigachat 3.1 tool parser + fix gigachat3 tool parser ( #36664 )
...
Signed-off-by: Viacheslav Barinov <viacheslav.teh@gmail.com >
2026-03-17 12:03:20 +00:00
Sage
00f8e0d211
[Frontend] Delegate tokenization serving preprocessing to OpenAIServingRender ( #37266 )
...
Signed-off-by: Sage Ahrac <sagiahrak@gmail.com >
2026-03-17 11:22:54 +00:00
Chauncey
132bfd45b6
[Bugfix][ResponsesAPI] Fix crash when tool_choice=required exceeds max_output_tokens ( #37258 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2026-03-17 08:54:52 +00:00
Flora Feng
3e3d320c1b
[Refactor] Relocate responses API tests ( #37241 )
...
Signed-off-by: sfeng33 <4florafeng@gmail.com >
2026-03-17 05:14:52 +00:00
Flora Feng
384dc7f77b
[Refactor] Relocate completion and chat completion tests ( #37125 )
...
Signed-off-by: sfeng33 <4florafeng@gmail.com >
2026-03-17 11:31:23 +08:00
Walter Beller-Morales
061980c36a
[Feature][Frontend] add support for Cohere Embed v2 API ( #37074 )
...
Signed-off-by: walterbm <walter.beller.morales@gmail.com >
2026-03-16 19:55:53 -04:00
Ben Browning
7a49742b88
[CI/Build] Add common tool call parser test suite ( #27599 )
...
Signed-off-by: Ben Browning <bbrownin@redhat.com >
2026-03-16 19:46:20 -04:00
Flora Feng
dfa8852db2
[Refactor] Consolidate GPT-OSS reasoning parser tests ( #36915 )
...
Signed-off-by: sfeng33 <4florafeng@gmail.com >
Signed-off-by: Flora Feng <4florafeng@gmail.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-03-16 15:53:07 -04:00
Max de Bayser
9f9ecff4cd
Add simple granite4 tool parser ( #36827 )
...
Signed-off-by: Max de Bayser <maxdebayser@gmail.com >
Signed-off-by: Max de Bayser <mbayser@br.ibm.com >
2026-03-16 10:49:09 -07:00
Benjamin Bartels
0e5a9382af
[Bugfix] accept redacted thinking blocks in Anthropic messages ( #36992 )
...
Signed-off-by: Benjamin Bartels <benjaminba@tiglab-ubuntu.ilab.local >
Signed-off-by: bbartels <benjamin@bartels.dev >
Co-authored-by: Benjamin Bartels <benjaminba@tiglab-ubuntu.ilab.local >
2026-03-16 22:01:57 +08:00
Isotr0py
912fbe9555
[Bugfix] Fix Qwen2.5-Omni/Qwen3-Omni use_audio_in_video with multi-video inputs ( #37147 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2026-03-16 08:56:06 +00:00
Andrew Xia
e9163b536e
[responsesAPI][ez] add a unit test for SimpleContext logprobs ( #37126 )
...
Signed-off-by: Andrew Xia <axia@meta.com >
2026-03-15 17:12:26 -07:00
Isotr0py
143e4dccdf
[Misc] Add online audio_in_video test ( #36775 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2026-03-15 00:14:11 -07:00
Sergey Zinchenko
4a718e770d
[Bug] Fix Failure in /v1/chat/completions/render for Multimodal Requests ( https://github.com/vllm-project/vllm/issues/35665 ) ( #35684 )
2026-03-14 14:10:11 +00:00
Flora Feng
bcfdadb1bc
[Refactor] Relocate chat completion and anthropic tests ( #36919 )
...
Signed-off-by: sfeng33 <4florafeng@gmail.com >
2026-03-14 12:16:16 +08:00
Andrew Xia
f680dc1b39
[responsesAPI] prioritize content over summary in reasoning item input ( #36516 )
...
Signed-off-by: Andrew Xia <axia@meta.com >
Signed-off-by: Andrew Xia <mitandrewxia@gmail.com >
Signed-off-by: Andrew Xia <axia@fb.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Andrew Xia <axia@fb.com >
2026-03-14 09:20:30 +08:00
Mark McLoughlin
7afe0faab1
[Frontend][Core] Re-add shutdown timeout - allowing in-flight requests to finish ( #36666 )
...
Signed-off-by: Mark McLoughlin <markmc@redhat.com >
Signed-off-by: Nick Hill <nickhill123@gmail.com >
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com >
Co-authored-by: Nick Hill <nickhill123@gmail.com >
2026-03-13 12:10:06 -07:00
Sage
a2268617cf
[Frontend] Delegate preprocessing to OpenAIServingRender ( #36483 )
...
Signed-off-by: Sage Ahrac <sagiahrak@gmail.com >
2026-03-13 00:39:43 -07:00