Commit Graph

553 Commits

Author SHA1 Message Date
PlatinumGod
6a09612b2e [Bugfix] Fix tool_choice="none" being ignored by GPT-OSS/harmony models (#30867)
Signed-off-by: yujiepu <pyjapple@gmail.com>
Signed-off-by: PlatinumGod <pyjapple@gmail.com>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
2025-12-19 09:34:27 +08:00
inkcherry
500f26e6d3 [Bugfix] fix DP-aware routing in OpenAI API requests (#29002)
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
2025-12-18 09:50:42 -08:00
Matthew Bonanni
7eb6cb6c18 [Attention] Update tests to remove deprecated env vars (#30563)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
2025-12-17 09:49:59 -08:00
Nicolò Lucchesi
9ca8cb38fd [CI][Bugfix] Fix flaky tests/entrypoints/openai/test_audio.py::test_chat_streaming_audio (#30878)
Signed-off-by: NickLucche <nlucches@redhat.com>
2025-12-17 18:49:56 +01:00
Chauncey
9ad5b21710 [Refactor] [4/N] Move VLLM_SERVER_DEV endpoints into the serve directory (#30749)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
2025-12-17 02:27:30 -08:00
Nicolò Lucchesi
ca702a14dc [Frontend] Add max-completion-token option to transcription/translation endpoints (#30769)
Signed-off-by: NickLucche <nlucches@redhat.com>
2025-12-16 19:36:49 +00:00
Andrew Xia
0d0c929f23 [responsesAPI][8] input/output messages for ResponsesParser (#30158)
Signed-off-by: Andrew Xia <axia@fb.com>
Signed-off-by: Andrew Xia <axia@meta.com>
Co-authored-by: Andrew Xia <axia@fb.com>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
2025-12-16 13:54:59 +08:00
penfree
bbd850e597 [Bugfix] fix streaming final output for non harmony (#30237)
Signed-off-by: penfree <qiupengfei@baidu.com>
Co-authored-by: penfree <qiupengfei@baidu.com>
2025-12-16 09:03:11 +08:00
Chauncey
2a1776b7ac [Refactor] [2/N] Move tool parsers into the vLLM main directory (#30675)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
2025-12-15 12:54:52 +00:00
Wenqi Glantz
84e23d103d additional protection for CVE-2025-62164 (#30649)
Signed-off-by: Wenqi Glantz <wglantz@nvidia.com>
2025-12-15 03:07:10 +00:00
Cyrus Leung
dcb31196da [Chore] Remove redundant RequestPrompt (#30612)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-12-14 09:22:37 +00:00
Cyrus Leung
64251f48df [Chore] Adjust tokenizer import to avoid circular imports (#30601)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-12-13 04:42:39 -08:00
Benjamin Bartels
f3237f3f6b [Frontend] Fixes anthropic streaming message_start usage nesting (#30266)
Signed-off-by: bbartels <benjamin@bartels.dev>
2025-12-12 16:28:54 +00:00
Ben Browning
8f8fda261a [Bugfix] Multiple fixes for gpt-oss Chat Completion prompting (#28729)
Signed-off-by: Ben Browning <bbrownin@redhat.com>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
2025-12-12 12:59:53 +08:00
Will Eaton
a9e4106f28 [P/D] KV Load Failure Recovery/Abort Configuration (#26813)
Signed-off-by: Will Eaton <weaton@redhat.com>
Signed-off-by: Will Eaton <me@wseaton.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
Co-authored-by: Mark McLoughlin <markmc@redhat.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
Co-authored-by: chaunceyjiang <chaunceyjiang@gmail.com>
2025-12-10 11:00:52 -08:00
daniel-salib
444f0e3f33 [Frontend] Add MCP type support infrastructure to Responses API (#30054)
Signed-off-by: Daniel Salib <danielsalib@meta.com>
2025-12-08 10:02:52 +08:00
Cyrus Leung
e83b7e379c Revert "[Renderer] Separate out RendererConfig from ModelConfig (#30145)" (#30199) 2025-12-07 00:00:22 -08:00
Cyrus Leung
27f4c2fd46 [Renderer] Separate out RendererConfig from ModelConfig (#30145)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-12-06 23:15:42 -08:00
Andrew Xia
421125d03a [ez] move harmony utils to parser folder (#30117)
Signed-off-by: Andrew Xia <axia@fb.com>
Co-authored-by: Andrew Xia <axia@fb.com>
2025-12-06 17:34:34 -05:00
Viacheslav
21bb323542 Gigachat 3 tool parser and tests (#29905)
Signed-off-by: Viacheslav Barinov <viacheslav.teh@gmail.com>
2025-12-06 12:04:14 +00:00
Deboleina
02a4169193 [Tests] Tool call tests for openai/gpt-oss-20b (#26237)
Signed-off-by: Debolina Roy <debroy@redhat.com>
2025-12-05 19:03:29 -08:00
Nicolò Lucchesi
e23ca3a0e8 [CI] Re-use whisper_client for all tests (#30148)
Signed-off-by: NickLucche <nlucches@redhat.com>
2025-12-05 19:47:37 +00:00
Andrew Xia
da7bc54ea8 [responsesAPI][5] ResponsesParser with tools for full MCP python loop (#29798)
Signed-off-by: Andrew Xia <axia@fb.com>
Signed-off-by: Andrew Xia <axia@meta.com>
Co-authored-by: Andrew Xia <axia@fb.com>
2025-12-05 11:11:50 -05:00
Cyrus Leung
9ae2f60374 [Misc] Various cleanups for MM input processing (#29970)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-12-04 06:22:20 +00:00
Benjamin Bartels
fca3f46658 [Frontend] Fixes anthropic /v1/messages streaming not containing input_tokens on first chunk (#29971)
Signed-off-by: bbartels <benjamin@bartels.dev>
2025-12-04 05:50:27 +00:00
Chauncey
3f42b05fbc [Refactor] [1/N] to simplify the vLLM serving architecture (#28040)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
2025-12-03 01:26:39 -08:00
Andrew Xia
3a7751485b [responsesAPI] support input output messages for non harmony models (#29549)
Signed-off-by: Andrew Xia <axia@fb.com>
Co-authored-by: Andrew Xia <axia@fb.com>
2025-12-02 23:59:23 -08:00
Andrew Xia
52cb349fc0 [responsesAPI][3] ResponsesParser to set up non harmony MCP (#29413)
Signed-off-by: Andrew Xia <axia@fb.com>
Co-authored-by: Andrew Xia <axia@fb.com>
2025-12-02 11:24:45 -05:00
ImaGoodFella
60c3d413af [Multimodal][Core] Optimize multimodal preprocessing cache by hashing image bytes instead of pixel values (#29621)
Signed-off-by: Rahul Steiger <rasteiger@ethz.ch>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2025-12-02 21:49:02 +08:00
Cyrus Leung
653591d5e7 [Chore] Move tokenizer initialization methods (#29793)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-12-02 13:33:37 +08:00
sangbumlikeagod
092bb73b8a [Frontend] add 'verbose_json' and 'timestamp' feature on Whisper Transcription/Translation (#24209)
Signed-off-by: sangbumlikeagod <oironese@naver.com>
Signed-off-by: sangbumlikeagod <98077576+sangbumlikeagod@users.noreply.github.com>
2025-12-01 18:19:17 +01:00
Cyrus Leung
f0a28bf661 [Misc] Unify tokenizer registration (#29767)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-12-01 11:34:58 +00:00
daniel-salib
014ece97c7 [Frontend] Add tool filtering support to ToolServer (#29224)
Signed-off-by: Daniel Salib <danielsalib@meta.com>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
2025-12-01 08:03:57 +00:00
wang.yuqi
62de4f4257 [Frontend] Resettle pooling entrypoints (#29634)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
2025-12-01 15:30:43 +08:00
Jee Jee Li
b9d0504a36 [Bugfix] Revert test_tokenization.py (#29729)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-11-29 16:35:15 +00:00
Cyrus Leung
34a984274e [Misc] Refactor tokenizer interface (#29693)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-11-29 04:02:21 -08:00
Jee Jee Li
39e63dec7c [LoRA] Cleanup LoRA unused code (#29611)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2025-11-28 22:52:58 -08:00
Cyrus Leung
b2c50eda50 [Bugfix] Fix wrong mock attribute (#29704)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-11-29 10:30:41 +08:00
Cyrus Leung
8d9338fae4 [Chore] Rename Processor to InputProcessor (#29682)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-11-28 09:35:41 -08:00
Nicolò Lucchesi
e5a621b724 [CI] Add batched audios Whisper test (#29308)
Signed-off-by: NickLucche <nlucches@redhat.com>
2025-11-27 19:31:52 +00:00
Mark McLoughlin
9cf4edae6e [Metrics] Scheduled removal of deprecated metrics (#29330)
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
2025-11-25 11:15:13 +08:00
Nick Hill
84371daf75 [Tests] Verify gpt_oss package is installed in harmony tests (#29336)
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-11-24 22:04:31 +00:00
Aydin Abiar
656516c315 [Bugfix] properly handle nested json with llama3 tool parser (#27701)
Signed-off-by: Aydin Abiar <aydin@anyscale.com>
Signed-off-by: Aydin Abiar <62435714+Aydin-ab@users.noreply.github.com>
Co-authored-by: Aydin Abiar <aydin@anyscale.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
2025-11-24 15:28:51 +00:00
Andrew Xia
742e9ff6b3 [responsesAPI] parse reasoning item input (#28248)
Signed-off-by: Andrew Xia <axia@fb.com>
Co-authored-by: Andrew Xia <axia@fb.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2025-11-22 15:42:11 +08:00
Kevin H. Luu
c64c0b78de [chore] Move the rest of wikimedia url to S3 (#28921)
Signed-off-by: Kevin H. Luu <khluu000@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-11-18 09:44:18 -08:00
Isotr0py
896e41ae04 [CI/Build] Replace wikipedia url with local server ones (#28908)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-11-18 08:10:55 +00:00
Cyrus Leung
bf9e1e8767 [Bugfix] Fix wrong CLI defaults for dynamic SchedulerConfig fields (#28872)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-11-17 20:30:29 -08:00
Nicolò Lucchesi
6f1e7f7226 [DisaggEverything] Tokens in<>out /generate endpoint (#24261)
Signed-off-by: NickLucche <nlucches@redhat.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-11-14 09:58:01 -07:00
Andrew Xia
7c38ed0f1c [Frontend] split append tool output (#28333)
Signed-off-by: Andrew Xia <axia@fb.com>
Co-authored-by: Andrew Xia <axia@fb.com>
2025-11-13 04:03:23 +00:00
wangxiyuan
e1710393c4 [[V0 deprecation]]Remove VLLM_USE_V1 env (#28204)
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-11-11 18:22:16 -07:00