Ben Browning
8477fe427d
[Tool] adjust_request to reasoning parser, and Gemma4 fixes ( #39027 )
...
Signed-off-by: Ben Browning <bbrownin@redhat.com >
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com >
Co-authored-by: Cursor <cursoragent@cursor.com >
2026-04-08 19:04:04 +00:00
Vedant V Jhaveri
2e56975657
Generative Scoring ( #34539 )
...
Signed-off-by: Vedant Jhaveri <vjhaveri@linkedin.com >
Co-authored-by: Vedant Jhaveri <vjhaveri@linkedin.com >
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2026-03-31 16:02:11 -07:00
wang.yuqi
ed359c497a
[Model] Deprecate the score task (this will not affect users). ( #37537 )
...
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io >
2026-03-20 08:07:56 +00:00
Flora Feng
9040151fe1
[V0 Deprecation] Deprecate --disable-frontend-multiprocessing ( #37612 )
...
Signed-off-by: sfeng33 <4florafeng@gmail.com >
2026-03-20 11:31:43 +08:00
Sage
00f8e0d211
[Frontend] Delegate tokenization serving preprocessing to OpenAIServingRender ( #37266 )
...
Signed-off-by: Sage Ahrac <sagiahrak@gmail.com >
2026-03-17 11:22:54 +00:00
Chauncey
6682c231fa
[Bugfix] Add error handling for FINISHED_ERROR in OpenAIServing ( #37148 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2026-03-16 16:27:47 +00:00
Sergey Zinchenko
4a718e770d
[Bug] Fix Failure in /v1/chat/completions/render for Multimodal Requests ( https://github.com/vllm-project/vllm/issues/35665 ) ( #35684 )
2026-03-14 14:10:11 +00:00
Sage
06e0bc21d2
[Frontend] Split OpenAIServingModels into OpenAIModelRegistry + OpenAIServingModels ( #36536 )
...
Signed-off-by: Sage Ahrac <sagiahrak@gmail.com >
2026-03-12 03:29:37 -07:00
Sage
4497431df6
[Frontend] Add GPU-less render serving path (vllm launch render) ( #36166 )
2026-03-08 16:35:09 +01:00
Ning Xie
176c799f4c
[openai api] log exception in exception handler (1/N) ( #31164 )
...
Signed-off-by: Andy Xie <andy.xning@gmail.com >
2026-03-05 16:00:12 +00:00
Hyunkyun Moon
bc6be89d16
[Frontend] Add vllm launch command for GPU-less preprocessing serving ( #34551 )
...
Signed-off-by: HyunKyun Moon <mhg5303@gmail.com >
2026-03-04 18:41:52 +00:00
pougetat
1659b2e058
[Feature] Add basic metrics for /realtime endpoint ( #35500 )
...
Signed-off-by: Thomas Pouget-Abadie <thomaspou@microsoft.com >
Signed-off-by: pougetat <thomas.pougetabadie@gmail.com >
Co-authored-by: Thomas Pouget-Abadie <thomaspou@microsoft.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-03-04 19:56:32 +08:00
wang.yuqi
dab1de9f38
[Frontend][CI] Consolidate instrumentator entrypoints ( #34123 )
...
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io >
2026-02-10 07:30:19 +00:00
kourosh hakhamaneshi
a75a5b54c7
[bug-fix] supported_tasks is breaking backward compatibility at init_app_state ( #34027 )
...
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com >
Signed-off-by: kourosh hakhamaneshi <31483498+kouroshHakha@users.noreply.github.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
2026-02-09 09:46:46 +08:00
emricksini-h
325ab6b0a8
[Feature] OTEL tracing during loading ( #31162 )
2026-02-05 16:59:28 -08:00
Nicolò Lucchesi
20f5d185a6
[Misc] Rename translations to speech_to_text for OAI serving component ( #33904 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2026-02-05 19:16:52 +00:00
Patrick von Platen
10152d2194
[Realtime API] Adds minimal realtime API based on websockets ( #33187 )
...
Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com >
Co-authored-by: Nick Hill <nickhill123@gmail.com >
2026-01-30 18:41:29 +08:00
wang.yuqi
7cbbca9aaa
[Frontend] Cleanup api server ( #33158 )
...
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io >
Signed-off-by: wang.yuqi <noooop@126.com >
2026-01-27 15:18:10 +00:00
wang.yuqi
76139d0801
[Frontend] Frontend will only attach supported tasks corresponding entrypoints. ( #33139 )
...
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io >
Signed-off-by: wang.yuqi <noooop@126.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
2026-01-27 12:15:43 +00:00
Jared Wen
6ee7f18f33
[Logging] add --disable-access-log-for-endpoints CLI option ( #30011 )
...
Add a new CLI option --disable-access-log-for-endpoints to suppress
uvicorn access logs for specified endpoints (e.g., /health, /metrics, /ping).
This addresses the need to reduce log noise in production environments
where health check endpoints are frequently polled by load balancers or
monitoring systems, generating excessive log entries that obscure
meaningful request logs.
Fixes #29982
Signed-off-by: JaredforReal <w13431838023@gmail.com >
2026-01-26 21:49:03 +00:00
7. Sun
0f19427db5
[Perf] Cache exc.errors() result in validation exception handler ( #32984 )
...
Signed-off-by: 7. Sun <jhao.sun@gmail.com >
2026-01-24 02:01:35 -08:00
Nick Hill
7fe255889e
[Misc] Log vLLM logo when starting server ( #32796 )
...
Signed-off-by: Nick Hill <nickhill123@gmail.com >
2026-01-23 11:15:12 +08:00
RickyChen / 陳昭儒
69d09fdd6c
[Feature] Add --ssl-ciphers CLI argument for TLS cipher control ( #30937 )
...
Signed-off-by: rickychen-infinirc <ricky.chen@infinirc.com >
2026-01-22 09:53:24 -08:00
Cyrus Leung
d117a4d1a9
[Frontend] Introduce Renderer for processing chat messages (using ModelConfig) ( #30200 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-01-22 12:44:22 +00:00
wang.yuqi
4ae77dfd42
[Frontend][1/n] Make pooling entrypoints request schema consensus | CompletionRequest ( #32395 )
...
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io >
2026-01-16 06:17:04 +00:00
Chauncey
707b44cc28
[Refactor] [11/N] to simplify the mcp architecture ( #32396 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2026-01-15 18:49:31 +08:00
Chauncey
4c1c501a7e
[Refactor] [10/N] to simplify the vLLM openai completion serving architecture ( #32369 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2026-01-15 07:41:34 +00:00
Chauncey
00e6402d56
[Frontend] track responsesAPI server_load ( #32323 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2026-01-14 12:00:37 +00:00
Cyrus Leung
3f28174c6a
[Frontend] Standardize use of create_error_response ( #32319 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-01-14 11:22:26 +00:00
Chauncey
769d0629e1
[Refactor] [9/N] to simplify the vLLM openai translations serving ar chitecture ( #32313 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2026-01-14 10:20:58 +00:00
Chauncey
9312a6c03a
[Refactor] [8/N] to simplify the vLLM openai responsesapi_serving architecture ( #32260 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2026-01-14 07:26:24 +00:00
Chauncey
fefce49807
[Refactor] [6/N] to simplify the vLLM openai chat_completion serving architecture ( #32240 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2026-01-13 13:01:39 +00:00
Kevin Šuc
ac9f9330e6
Rename --exclude-log-deltas to --enable-log-deltas ( #32020 )
...
Signed-off-by: Catacomba <kevinsuc16@gmail.com >
2026-01-09 15:30:40 +00:00
Cyrus Leung
aa125ecf0e
[Frontend] Improve error message ( #31987 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-01-08 20:07:03 +00:00
R3hankhan
1ab055efe6
[OpenAI] Extend VLLMValidationError to additional validation parameters ( #31870 )
...
Signed-off-by: Rehan Khan <Rehan.Khan7@ibm.com >
2026-01-07 14:45:49 +00:00
Kevin Šuc
79ed460dd5
[Frontend] [Doc] Exclude log deltas feature ( #30322 )
...
Signed-off-by: Catacomba <kevinsuc16@gmail.com >
Signed-off-by: Kevin Šuc <kevinsuc16@gmail.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
2026-01-05 16:34:35 +00:00
Hojin Yang
dc837bc23e
feat(frontend): add --default-chat-template-kwargs CLI argument ( #31343 )
...
Signed-off-by: effortprogrammer <yhjhoward7@gmail.com >
2025-12-30 03:38:47 +00:00
RickyChen / 陳昭儒
b3a2bdf1ac
[Feature] Add offline FastAPI documentation support for air-gapped environments ( #30184 )
...
Signed-off-by: rickychen-infinirc <ricky.chen@infinirc.com >
Signed-off-by: RickyChen / 陳昭儒 <ricky.chen@infinirc.com >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-12-29 16:22:39 +00:00
R3hankhan
769f27e701
[OpenAI] Add parameter metadata to validation errors ( #30134 )
...
Signed-off-by: Rehan Khan <Rehan.Khan7@ibm.com >
2025-12-23 11:30:12 +00:00
Jakub Zakrzewski
23daef548d
[Frontend] Support using chat template as custom score template for reranking models ( #30550 )
...
Signed-off-by: Jakub Zakrzewski <jzakrzewski@nvidia.com >
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io >
Signed-off-by: wang.yuqi <noooop@126.com >
Co-authored-by: wang.yuqi <yuqi.wang@daocloud.io >
2025-12-23 11:19:16 +00:00
Nathan Price
05a83dc6ee
feat(api): Eager chat template warmup to eliminate first-request latency ( #30700 )
...
Signed-off-by: Nathan Price <nathan@abridge.com >
2025-12-18 00:01:29 +00:00
Chauncey
9ad5b21710
[Refactor] [4/N] Move VLLM_SERVER_DEV endpoints into the serve directory ( #30749 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2025-12-17 02:27:30 -08:00
Chauncey
2a1776b7ac
[Refactor] [2/N] Move tool parsers into the vLLM main directory ( #30675 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2025-12-15 12:54:52 +00:00
Cyrus Leung
e83b7e379c
Revert "[Renderer] Separate out RendererConfig from ModelConfig ( #30145 )" ( #30199 )
2025-12-07 00:00:22 -08:00
Cyrus Leung
27f4c2fd46
[Renderer] Separate out RendererConfig from ModelConfig ( #30145 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-12-06 23:15:42 -08:00
Tova Movshovitz
adb315060c
[KVConnector][Feature] Support KV connector cache reset via /reset_prefix_cache ( #27170 )
...
Signed-off-by: tovam <tovam@pliops.com >
Signed-off-by: Tova Movshovitz <tovam@pliops.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-12-05 18:33:26 +00:00
Chauncey
3f42b05fbc
[Refactor] [1/N] to simplify the vLLM serving architecture ( #28040 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2025-12-03 01:26:39 -08:00
Zhuohan Li
d0cd728907
[Core] Support reseting all running requests' KV while calling reset_prefix_cache ( #28827 )
...
Signed-off-by: Zhuohan Li <zhuohan123@gmail.com >
Signed-off-by: Nick Hill <nhill@redhat.com >
Co-authored-by: Nick Hill <nhill@redhat.com >
2025-12-02 02:25:05 +00:00
sangbumlikeagod
092bb73b8a
[Frontend] add 'verbose_json' and 'timestamp' feature on Whisper Transcription/Translation ( #24209 )
...
Signed-off-by: sangbumlikeagod <oironese@naver.com >
Signed-off-by: sangbumlikeagod <98077576+sangbumlikeagod@users.noreply.github.com >
2025-12-01 18:19:17 +01:00
wang.yuqi
62de4f4257
[Frontend] Resettle pooling entrypoints ( #29634 )
...
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io >
2025-12-01 15:30:43 +08:00