Commit Graph

847 Commits

Author SHA1 Message Date
R3hankhan
1ab055efe6 [OpenAI] Extend VLLMValidationError to additional validation parameters (#31870)
Signed-off-by: Rehan Khan <Rehan.Khan7@ibm.com>
2026-01-07 14:45:49 +00:00
Kevin Šuc
79ed460dd5 [Frontend] [Doc] Exclude log deltas feature (#30322)
Signed-off-by: Catacomba <kevinsuc16@gmail.com>
Signed-off-by: Kevin Šuc <kevinsuc16@gmail.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2026-01-05 16:34:35 +00:00
cjackal
e2701cc525 [Frontend] [Bugfix] respect server-level default chat template kwargs in reasoning parser (#31581)
Signed-off-by: cjackal <44624812+cjackal@users.noreply.github.com>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
2026-01-05 05:42:47 +00:00
Ning Xie
6522721d17 [misc] Sort uvicorn log level description according to verbosity (#31137)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
2026-01-04 18:45:37 +00:00
Nick Hill
3b312fb792 [Minor] Various small code cleanups/simplifications (#31508)
Signed-off-by: njhill <nickhill123@gmail.com>
2025-12-29 22:42:06 -08:00
Hojin Yang
dc837bc23e feat(frontend): add --default-chat-template-kwargs CLI argument (#31343)
Signed-off-by: effortprogrammer <yhjhoward7@gmail.com>
2025-12-30 03:38:47 +00:00
RickyChen / 陳昭儒
b3a2bdf1ac [Feature] Add offline FastAPI documentation support for air-gapped environments (#30184)
Signed-off-by: rickychen-infinirc <ricky.chen@infinirc.com>
Signed-off-by: RickyChen / 陳昭儒 <ricky.chen@infinirc.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-12-29 16:22:39 +00:00
amittell
9c884faa95 [Bugfix] Preserve tool call id/type/name in streaming finish chunk (#31438)
Signed-off-by: amittell <mittell@me.com>
Signed-off-by: Alex Mittell <mittell@me.com>
2025-12-29 21:10:52 +08:00
Michael Goin
bc5ef333e0 [Perf] Add skip_clone to SamplingParams for internal request handling (#31041)
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-12-24 14:35:57 -08:00
Cyrus Leung
09dc7c690c [Chore][1/2] Drop v0.14 deprecations (#31285)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-12-24 09:54:01 -08:00
R3hankhan
769f27e701 [OpenAI] Add parameter metadata to validation errors (#30134)
Signed-off-by: Rehan Khan <Rehan.Khan7@ibm.com>
2025-12-23 11:30:12 +00:00
Jakub Zakrzewski
23daef548d [Frontend] Support using chat template as custom score template for reranking models (#30550)
Signed-off-by: Jakub Zakrzewski <jzakrzewski@nvidia.com>
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Signed-off-by: wang.yuqi <noooop@126.com>
Co-authored-by: wang.yuqi <yuqi.wang@daocloud.io>
2025-12-23 11:19:16 +00:00
AlonKejzman
bd6d5a7475 [gpt-oss] Fix harmony parser in streaming responses (#30205)
Signed-off-by: AlonKejzman <alonkeizman@gmail.com>
2025-12-22 20:56:06 +08:00
汪志鹏
3e92b2b7ac [BugFix]fix gpt-oss v1/completions response bug (#30608)
Signed-off-by: princepride <wangzhipeng628@gmail.com>
Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
Co-authored-by: bbrowning <bbrownin@redhat.com>
2025-12-21 10:39:31 +08:00
Marko Rosenmueller
455949675d [Frontend][Bug] allow tool calls in analysis channel (#28139)
Signed-off-by: Marko Rosenmueller <5467316+dr75@users.noreply.github.com>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
2025-12-19 10:47:44 +00:00
lif
086b96339f [Bugfix] Add validation for tool requests when tool_parser is unavailable (#30613)
Signed-off-by: majiayu000 <1835304752@qq.com>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-19 18:23:28 +08:00
PlatinumGod
6a09612b2e [Bugfix] Fix tool_choice="none" being ignored by GPT-OSS/harmony models (#30867)
Signed-off-by: yujiepu <pyjapple@gmail.com>
Signed-off-by: PlatinumGod <pyjapple@gmail.com>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
2025-12-19 09:34:27 +08:00
inkcherry
500f26e6d3 [Bugfix] fix DP-aware routing in OpenAI API requests (#29002)
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
2025-12-18 09:50:42 -08:00
Nathan Price
fc2ae6d617 fix: add warmup for audio preprocessing (#30706)
Signed-off-by: Nathan Price <nathan@abridge.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2025-12-18 06:12:29 +00:00
Nathan Price
05a83dc6ee feat(api): Eager chat template warmup to eliminate first-request latency (#30700)
Signed-off-by: Nathan Price <nathan@abridge.com>
2025-12-18 00:01:29 +00:00
Chauncey
9ad5b21710 [Refactor] [4/N] Move VLLM_SERVER_DEV endpoints into the serve directory (#30749)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
2025-12-17 02:27:30 -08:00
Andrew Xia
4c054d89aa [Doc][ResponsesAPI] add documentation (#30840)
Signed-off-by: Andrew Xia <axia@fb.com>
Co-authored-by: Andrew Xia <axia@fb.com>
2025-12-17 01:53:02 -08:00
Nicolò Lucchesi
ca702a14dc [Frontend] Add max-completion-token option to transcription/translation endpoints (#30769)
Signed-off-by: NickLucche <nlucches@redhat.com>
2025-12-16 19:36:49 +00:00
Andrew Xia
0d0c929f23 [responsesAPI][8] input/output messages for ResponsesParser (#30158)
Signed-off-by: Andrew Xia <axia@fb.com>
Signed-off-by: Andrew Xia <axia@meta.com>
Co-authored-by: Andrew Xia <axia@fb.com>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
2025-12-16 13:54:59 +08:00
penfree
bbd850e597 [Bugfix] fix streaming final output for non harmony (#30237)
Signed-off-by: penfree <qiupengfei@baidu.com>
Co-authored-by: penfree <qiupengfei@baidu.com>
2025-12-16 09:03:11 +08:00
mondaylord
17fec3af09 [Bugfix] Fix missing first token in tool calls during reasoning-to-tool transition (#30671)
Signed-off-by: mondaylord <20212010046@fudan.edu.cn>
2025-12-15 16:13:37 +00:00
Chauncey
2a1776b7ac [Refactor] [2/N] Move tool parsers into the vLLM main directory (#30675)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
2025-12-15 12:54:52 +00:00
Jee Jee Li
a524d1ba0a [Bugfix] Fix deepseek_v32 tokenizer_mode (#30658)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-12-15 04:20:31 +00:00
RioS
9ccbf6b692 [responsesAPI]add extra body parameters (#30532)
Signed-off-by: Ri0S <aa248424@gmail.com>
2025-12-14 19:25:45 +00:00
Didier Durand
1a55cfafcb [Doc]: fixing typos in various files (#30540)
Signed-off-by: Didier Durand <durand.didier@gmail.com>
Signed-off-by: Didier Durand <2927957+didier-durand@users.noreply.github.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
2025-12-14 02:14:37 -08:00
Cyrus Leung
dcb31196da [Chore] Remove redundant RequestPrompt (#30612)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-12-14 09:22:37 +00:00
Cyrus Leung
64251f48df [Chore] Adjust tokenizer import to avoid circular imports (#30601)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-12-13 04:42:39 -08:00
Ben Browning
8f8fda261a [Bugfix] Multiple fixes for gpt-oss Chat Completion prompting (#28729)
Signed-off-by: Ben Browning <bbrownin@redhat.com>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
2025-12-12 12:59:53 +08:00
Kenichi Maehashi
853611bb18 Fix typo of endpoint name in CLI args docs (#30473)
Signed-off-by: Kenichi Maehashi <maehashi@preferred.jp>
2025-12-11 11:07:56 +00:00
Will Eaton
a9e4106f28 [P/D] KV Load Failure Recovery/Abort Configuration (#26813)
Signed-off-by: Will Eaton <weaton@redhat.com>
Signed-off-by: Will Eaton <me@wseaton.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
Co-authored-by: Mark McLoughlin <markmc@redhat.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
Co-authored-by: chaunceyjiang <chaunceyjiang@gmail.com>
2025-12-10 11:00:52 -08:00
Andrew Xia
c3487aca34 [responsesAPI][6] Fix multi turn MCP tokenization (#30230)
Signed-off-by: Andrew Xia <axia@fb.com>
Co-authored-by: Andrew Xia <axia@fb.com>
2025-12-10 10:13:13 +08:00
Julien Denize
5c213d2899 [BUGFIX] Mistral tool call parser v11+ (#30332)
Signed-off-by: juliendenize <julien.denize@mistral.ai>
2025-12-09 14:55:38 +00:00
daniel-salib
444f0e3f33 [Frontend] Add MCP type support infrastructure to Responses API (#30054)
Signed-off-by: Daniel Salib <danielsalib@meta.com>
2025-12-08 10:02:52 +08:00
Cyrus Leung
e83b7e379c Revert "[Renderer] Separate out RendererConfig from ModelConfig (#30145)" (#30199) 2025-12-07 00:00:22 -08:00
Cyrus Leung
27f4c2fd46 [Renderer] Separate out RendererConfig from ModelConfig (#30145)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-12-06 23:15:42 -08:00
Andrew Xia
421125d03a [ez] move harmony utils to parser folder (#30117)
Signed-off-by: Andrew Xia <axia@fb.com>
Co-authored-by: Andrew Xia <axia@fb.com>
2025-12-06 17:34:34 -05:00
Viacheslav
21bb323542 Gigachat 3 tool parser and tests (#29905)
Signed-off-by: Viacheslav Barinov <viacheslav.teh@gmail.com>
2025-12-06 12:04:14 +00:00
Tova Movshovitz
adb315060c [KVConnector][Feature] Support KV connector cache reset via /reset_prefix_cache (#27170)
Signed-off-by: tovam <tovam@pliops.com>
Signed-off-by: Tova Movshovitz <tovam@pliops.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-12-05 18:33:26 +00:00
Andrew Xia
da7bc54ea8 [responsesAPI][5] ResponsesParser with tools for full MCP python loop (#29798)
Signed-off-by: Andrew Xia <axia@fb.com>
Signed-off-by: Andrew Xia <axia@meta.com>
Co-authored-by: Andrew Xia <axia@fb.com>
2025-12-05 11:11:50 -05:00
Peng-YM
48a5fff66e [Bugfix] Missing tokens in return_token_ids when tool parsers is enabled in streaming mode (#29074)
Signed-off-by: Peng-YM <1048217874pengym@gmail.com>
2025-12-04 19:09:39 +00:00
Xu Wenqing
ffdd18111b Add DeepSeek-V3.2 tool parser. (#29848)
Signed-off-by: 许文卿 <xwq391974@alibaba-inc.com>
2025-12-04 08:46:34 +00:00
Cyrus Leung
9ae2f60374 [Misc] Various cleanups for MM input processing (#29970)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-12-04 06:22:20 +00:00
avigny
dd5d1ef780 [Bugfix] Mistral tool parser streaming update (#19425)
Signed-off-by: avigny <47987522+avigny@users.noreply.github.com>
Signed-off-by: Chauncey <chaunceyjiang@gmail.com>
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
Co-authored-by: Jeff Cook <jeff@jeffcook.io>
Co-authored-by: sfbemerk <benjaminmerkel@mail.de>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2025-12-03 17:45:31 +00:00
Chauncey
b78772c433 [Frontend] supports deepseekv32 chat template (#29837)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
2025-12-03 20:53:44 +08:00
Chauncey
3f42b05fbc [Refactor] [1/N] to simplify the vLLM serving architecture (#28040)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
2025-12-03 01:26:39 -08:00