Commit Graph

319 Commits

Author SHA1 Message Date
7. Sun
0f19427db5 [Perf] Cache exc.errors() result in validation exception handler (#32984)
Signed-off-by: 7. Sun <jhao.sun@gmail.com>
2026-01-24 02:01:35 -08:00
Nick Hill
7fe255889e [Misc] Log vLLM logo when starting server (#32796)
Signed-off-by: Nick Hill <nickhill123@gmail.com>
2026-01-23 11:15:12 +08:00
RickyChen / 陳昭儒
69d09fdd6c [Feature] Add --ssl-ciphers CLI argument for TLS cipher control (#30937)
Signed-off-by: rickychen-infinirc <ricky.chen@infinirc.com>
2026-01-22 09:53:24 -08:00
Cyrus Leung
d117a4d1a9 [Frontend] Introduce Renderer for processing chat messages (using ModelConfig) (#30200)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2026-01-22 12:44:22 +00:00
wang.yuqi
4ae77dfd42 [Frontend][1/n] Make pooling entrypoints request schema consensus | CompletionRequest (#32395)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
2026-01-16 06:17:04 +00:00
Chauncey
707b44cc28 [Refactor] [11/N] to simplify the mcp architecture (#32396)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
2026-01-15 18:49:31 +08:00
Chauncey
4c1c501a7e [Refactor] [10/N] to simplify the vLLM openai completion serving architecture (#32369)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
2026-01-15 07:41:34 +00:00
Chauncey
00e6402d56 [Frontend] track responsesAPI server_load (#32323)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
2026-01-14 12:00:37 +00:00
Cyrus Leung
3f28174c6a [Frontend] Standardize use of create_error_response (#32319)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2026-01-14 11:22:26 +00:00
Chauncey
769d0629e1 [Refactor] [9/N] to simplify the vLLM openai translations serving ar chitecture (#32313)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
2026-01-14 10:20:58 +00:00
Chauncey
9312a6c03a [Refactor] [8/N] to simplify the vLLM openai responsesapi_serving architecture (#32260)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
2026-01-14 07:26:24 +00:00
Chauncey
fefce49807 [Refactor] [6/N] to simplify the vLLM openai chat_completion serving architecture (#32240)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
2026-01-13 13:01:39 +00:00
Kevin Šuc
ac9f9330e6 Rename --exclude-log-deltas to --enable-log-deltas (#32020)
Signed-off-by: Catacomba <kevinsuc16@gmail.com>
2026-01-09 15:30:40 +00:00
Cyrus Leung
aa125ecf0e [Frontend] Improve error message (#31987)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2026-01-08 20:07:03 +00:00
R3hankhan
1ab055efe6 [OpenAI] Extend VLLMValidationError to additional validation parameters (#31870)
Signed-off-by: Rehan Khan <Rehan.Khan7@ibm.com>
2026-01-07 14:45:49 +00:00
Kevin Šuc
79ed460dd5 [Frontend] [Doc] Exclude log deltas feature (#30322)
Signed-off-by: Catacomba <kevinsuc16@gmail.com>
Signed-off-by: Kevin Šuc <kevinsuc16@gmail.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2026-01-05 16:34:35 +00:00
Hojin Yang
dc837bc23e feat(frontend): add --default-chat-template-kwargs CLI argument (#31343)
Signed-off-by: effortprogrammer <yhjhoward7@gmail.com>
2025-12-30 03:38:47 +00:00
RickyChen / 陳昭儒
b3a2bdf1ac [Feature] Add offline FastAPI documentation support for air-gapped environments (#30184)
Signed-off-by: rickychen-infinirc <ricky.chen@infinirc.com>
Signed-off-by: RickyChen / 陳昭儒 <ricky.chen@infinirc.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-12-29 16:22:39 +00:00
R3hankhan
769f27e701 [OpenAI] Add parameter metadata to validation errors (#30134)
Signed-off-by: Rehan Khan <Rehan.Khan7@ibm.com>
2025-12-23 11:30:12 +00:00
Jakub Zakrzewski
23daef548d [Frontend] Support using chat template as custom score template for reranking models (#30550)
Signed-off-by: Jakub Zakrzewski <jzakrzewski@nvidia.com>
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Signed-off-by: wang.yuqi <noooop@126.com>
Co-authored-by: wang.yuqi <yuqi.wang@daocloud.io>
2025-12-23 11:19:16 +00:00
Nathan Price
05a83dc6ee feat(api): Eager chat template warmup to eliminate first-request latency (#30700)
Signed-off-by: Nathan Price <nathan@abridge.com>
2025-12-18 00:01:29 +00:00
Chauncey
9ad5b21710 [Refactor] [4/N] Move VLLM_SERVER_DEV endpoints into the serve directory (#30749)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
2025-12-17 02:27:30 -08:00
Chauncey
2a1776b7ac [Refactor] [2/N] Move tool parsers into the vLLM main directory (#30675)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
2025-12-15 12:54:52 +00:00
Cyrus Leung
e83b7e379c Revert "[Renderer] Separate out RendererConfig from ModelConfig (#30145)" (#30199) 2025-12-07 00:00:22 -08:00
Cyrus Leung
27f4c2fd46 [Renderer] Separate out RendererConfig from ModelConfig (#30145)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-12-06 23:15:42 -08:00
Tova Movshovitz
adb315060c [KVConnector][Feature] Support KV connector cache reset via /reset_prefix_cache (#27170)
Signed-off-by: tovam <tovam@pliops.com>
Signed-off-by: Tova Movshovitz <tovam@pliops.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-12-05 18:33:26 +00:00
Chauncey
3f42b05fbc [Refactor] [1/N] to simplify the vLLM serving architecture (#28040)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
2025-12-03 01:26:39 -08:00
Zhuohan Li
d0cd728907 [Core] Support reseting all running requests' KV while calling reset_prefix_cache (#28827)
Signed-off-by: Zhuohan Li <zhuohan123@gmail.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
2025-12-02 02:25:05 +00:00
sangbumlikeagod
092bb73b8a [Frontend] add 'verbose_json' and 'timestamp' feature on Whisper Transcription/Translation (#24209)
Signed-off-by: sangbumlikeagod <oironese@naver.com>
Signed-off-by: sangbumlikeagod <98077576+sangbumlikeagod@users.noreply.github.com>
2025-12-01 18:19:17 +01:00
wang.yuqi
62de4f4257 [Frontend] Resettle pooling entrypoints (#29634)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
2025-12-01 15:30:43 +08:00
HappyAmazonian
f8151b66fa Revert "Supress verbose logs from model_hosting_container_standards (… (#29335)
Signed-off-by: Shen Teng <sheteng@amazon.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2025-11-28 05:29:05 -08:00
Samit
371b1d4c61 [RL] Add Pause and Resume Generation for Asynchronous RL Training (#28037)
Signed-off-by: SamitHuang <285365963@qq.com>
Signed-off-by: Samit <285365963@qq.com>
Signed-off-by: samithuang <285365963@qq.com>
Co-authored-by: 22quinn <33176974+22quinn@users.noreply.github.com>
2025-11-20 03:01:03 -08:00
Michael Goin
67745d189f Supress verbose logs from model_hosting_container_standards (#28949)
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-11-18 12:29:06 -08:00
Zhuohan Li
dd6ac1c2bb [RL] [V1] Remove unused device argument from reset_kv_cache (#28766)
Signed-off-by: Zhuohan Li <zhuohan123@gmail.com>
2025-11-14 23:59:42 -08:00
Nicolò Lucchesi
6f1e7f7226 [DisaggEverything] Tokens in<>out /generate endpoint (#24261)
Signed-off-by: NickLucche <nlucches@redhat.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-11-14 09:58:01 -07:00
Srreyansh Sethi
360bd8762f [Frontend] Added chat-style multimodal support to /classify. (#27516)
Signed-off-by: WorldExplored <srreyansh.sethi@gmail.com>
Signed-off-by: Srreyansh Sethi <107075589+WorldExplored@users.noreply.github.com>
Signed-off-by: vnadathur <glvikramn@gmail.com>
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Co-authored-by: vnadathur <236933696+vnadathur@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: vnadathur <glvikramn@gmail.com>
Co-authored-by: wang.yuqi <noooop@126.com>
Co-authored-by: wang.yuqi <yuqi.wang@daocloud.io>
2025-11-14 11:03:55 +00:00
Zuyi Zhao
bca74e32b7 [Frontend] Add sagemaker_standards dynamic lora adapter and stateful session management decorators to vLLM OpenAI API server (#27892)
Signed-off-by: Zuyi Zhao <zhaozuy@amazon.com>
Signed-off-by: Shen Teng <sheteng@amazon.com>
Co-authored-by: Shen Teng <sheteng@amazon.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
2025-11-11 04:57:01 +00:00
Jialin Ouyang
b30372cbd0 [Perf] Move gc.freeze logic from EngineCoreProc to EngineCore for better coverage (#27896)
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
2025-11-10 15:34:18 -08:00
Benjamin Chislett
975676d174 [Feat] Drop-in Torch CUDA Profiler (#27841)
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>
2025-11-08 14:07:37 -08:00
Vico Chu
d4aa65c998 [Chore] eliminate duplicated and unconditional object serialization in anthropic messages api (#27792)
Signed-off-by: Vico Chu <vico24826@gmail.com>
2025-11-06 19:09:19 +00:00
Roy Wang
d1dd5f53e4 [Frontend] Fix logging format when enable response logging (#28049)
Signed-off-by: esmeetu <jasonailu87@gmail.com>
2025-11-06 16:25:39 +00:00
Walter Beller-Morales
752ddeacaa [Core] add support for reasoning parser plugins (#28075)
Signed-off-by: walter beller-morales <walter.beller.morales@gmail.com>
2025-11-06 01:15:06 +08:00
Chauncey
e261d37c9a [Refactor] Lazy-loaded reasoning_parser (#28092)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
2025-11-05 15:37:02 +08:00
wangxiyuan
428bc7bf1c [V0 deprecation] Remove VLLM_USE_V1 usage in most modules (#27955)
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-11-04 20:51:16 -08:00
Chauncey
c02fccdbd2 [Refactor] Lazy import tool_parser (#27974)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
2025-11-04 10:10:10 +08:00
Misha Efimov
ba464e6ae2 Add ORCA endpoint load metrics support (#24905)
Signed-off-by: Misha Efimov <mef@google.com>
2025-11-03 08:21:31 +00:00
Benjamin Bartels
1e88fb751b Adds anthropic /v1/messages endpoint to openai api_server (#27882)
Signed-off-by: bbartels <benjamin@bartels.dev>
Signed-off-by: Benjamin Bartels <benjamin@bartels.dev>
2025-11-01 12:45:42 -07:00
Nick Hill
9e5bd3076e [Cleanup] Remove no-longer-used SpeculativeConfig.enable_chunked_prefill (#27826)
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-10-31 10:57:45 -07:00
wang.yuqi
4464723f22 [Frontend][Doc][5/N] Improve all pooling task | Polish encode (pooling) api & Document. (#25524)
Signed-off-by: wang.yuqi <noooop@126.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2025-10-30 12:13:05 +00:00
Cyrus Leung
f58d9b6404 [Misc] Separate out utils.counter and move utils.Device to engine (#27588)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-10-28 12:20:46 +00:00