Commit Graph

790 Commits

Author SHA1 Message Date
Kata Coder
5719a4e4e6 [Frontend] Support multimodal inputs for late-interaction scoring (ColQwen3) + NewModel: nvidia/nemotron-colembed (#34574)
Signed-off-by: craftsangjae <craftsangjae@gmail.com>
2026-02-20 20:01:40 -08:00
Vlad Tiberiu Mihailescu
e739c29ea4 [CI/Build] Add opentelemetry libs in default vllm build (requirements/common.txt) (#34466)
Signed-off-by: Vlad Mihailescu <vtmihailescu@gmail.com>
2026-02-20 19:54:55 -08:00
junuxyz
c61a98f529 [CI][BugFix] ShellCheck cleanup to remove baseline and preserve runtime behavior (#34514)
Signed-off-by: junuxyz <216036880+junuxyz@users.noreply.github.com>
2026-02-17 12:22:56 +00:00
ChenqianCao
ad65177a19 [Bugfix] Fix 'remove_instance_endpoint' method logic in disagg_proxy_demo (#32922)
Signed-off-by: ChenqianCao <39755070+ChenqianCao@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2026-02-17 10:06:53 +00:00
Christian Pinto
6930becd45 (bugfix): Fixed encode in LLM entrypoint for IOProcessr plugin prompts (#34618)
Signed-off-by: Christian Pinto <christian.pinto@ibm.com>
2026-02-16 07:33:55 -08:00
Christian Pinto
342a7cda2d [Misc] Update tests and examples for Prithvi/Terratorch models (#34416)
Signed-off-by: Christian Pinto <christian.pinto@ibm.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2026-02-13 23:03:51 -08:00
Kata Coder
d1ea65d0a1 [new model] add COLQwen3 code & Inference (#34398)
Signed-off-by: craftsangjae <craftsangjae@gmail.com>
Signed-off-by: katacoder <craftsangjae@gmail.com>
2026-02-14 12:15:19 +08:00
Ilya Boytsov
071d863e20 Extend ColBERT support to non-standard BERT backbones (#34170)
Signed-off-by: Ilya Boytsov <ilya.boytsov@aleph-alpha.com>
2026-02-13 09:53:09 +00:00
Aaron Hao
dddbff4624 [Core] Move pause and resume functions into engine (#34125)
Signed-off-by: ahao-anyscale <ahao@anyscale.com>
Signed-off-by: Aaron Hao <ahao@anyscale.com>
Signed-off-by: hao-aaron <ahao@anyscale.com>
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Nick Hill <nickhill123@gmail.com>
2026-02-13 00:15:10 -08:00
AllenDou
21dfb842d7 [model] support FunASR model (#33247)
Signed-off-by: zixiao <shunli.dsl@alibaba-inc.com>
Co-authored-by: zixiao <shunli.dsl@alibaba-inc.com>
2026-02-11 07:37:09 +00:00
Lucas Wilkinson
ba0511fd80 [Misc] Add run one batch script that supports profiling (#32968)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2026-02-10 18:29:49 -08:00
zzaebok
cbea11c9f0 [Docs] Fix format error in KV load failure recovery doc (#34137)
Signed-off-by: Jaebok Lee <jaebok9541@naver.com>
2026-02-10 02:16:26 -08:00
Cyrus Leung
2c32558a3c [Bugfix] Fix --trust-remote-code conflict (#34218)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2026-02-10 00:29:10 -08:00
Cyrus Leung
25e48a3aae [Doc] Update usage of --limit-mm-per-prompt (#34148)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2026-02-09 21:12:13 -08:00
wang.yuqi
22b64948f6 [Frontend][last/5] Make pooling entrypoints request schema consensus. (#31127)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
2026-02-09 06:42:38 +00:00
Aaron Hao
89a385d79f [Feat][RL] Pause and Resume with keep requests for single engine (#32351)
Signed-off-by: ahao-anyscale <ahao@anyscale.com>
Signed-off-by: Aaron Hao <ahao@anyscale.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
2026-02-07 00:08:58 +00:00
SorenDreano
6e7b1c4b59 [Docs] Improve documentation (#33799)
Co-authored-by: Soren Dreano <soren@numind.ai>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
2026-02-06 12:57:09 +00:00
Benjamin Chislett
af3162d3aa [Spec Decode] Unified Parallel Drafting (#32887)
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>
2026-02-05 12:37:18 -05:00
Aaron Hao
c1858b7ec8 [Feat][RL][1/2] Native Weight Syncing API: NCCL (#31943)
Signed-off-by: ahao-anyscale <ahao@anyscale.com>
Signed-off-by: Aaron Hao <ahao@anyscale.com>
Co-authored-by: SumanthRH <sumanthrh99@gmail.com>
2026-02-05 12:13:23 -05:00
Ilya Boytsov
439afa4eea feat: Add ColBERT late interaction model support (#33686)
Signed-off-by: Ilya Boytsov <ilyaboytsov1805@gmail.com>
Signed-off-by: Ilya Boytsov <boytsovpanamera@mail.ru>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: wang.yuqi <yuqi.wang@daocloud.io>
2026-02-05 08:05:13 +08:00
wang.yuqi
1b8fe6f7c4 [Frontend][4/n] Make pooling entrypoints request schema consensus | ScoreRequest (#33060)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
2026-02-04 01:48:40 +00:00
Patrick von Platen
3f7662d650 [Voxtral Realtime] Change name (#33716)
Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com>
2026-02-03 13:03:28 -08:00
dtc
0d6ccf68fa [P/D] rework mooncake connector and introduce its bootstrap server (#31034)
Signed-off-by: Tianchen Ding <dtcccc@linux.alibaba.com>
Co-authored-by: Nicolò Lucchesi <nicolo.lucchesi@gmail.com>
2026-02-03 08:08:25 -08:00
zxy
a3acfa1071 [Models] Intern-S1-Pro (#33636)
Signed-off-by: zxy <zhou0493@e.ntu.edu.sg>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2026-02-03 05:49:45 -08:00
Cyrus Leung
83449a5ff0 [Refactor] Clean up pooling serial utils (#33665)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2026-02-03 10:29:18 +00:00
Yang Liu
199e3cb476 [Model] Use mm_position to compute mrope positions for GLM-4.xV (#33039)
Signed-off-by: Yang <lymailforjob@gmail.com>
2026-02-02 16:55:48 +00:00
Isotr0py
4061dcf4c5 [Bugfix] Enable Kimi k25 processor test (#33562)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2026-02-02 14:25:25 +00:00
Komal Kumar Teru
ba871fb788 [Misc] support arbitrary MM datasets in spec dec bench (#33486)
Signed-off-by: kkt-cohere <komal@cohere.com>
Signed-off-by: Komal Kumar Teru <162363718+kkt-cohere@users.noreply.github.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2026-02-02 08:49:48 +00:00
RED
808dd87b30 [Model] Support DeepSeek-OCR-2 (#33165)
Signed-off-by: liuli <ll407707@alibaba-inc.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: liuli <ll407707@alibaba-inc.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2026-02-02 06:24:10 +00:00
Cyrus Leung
f0a1c8453a [Frontend] Use new Renderer for Completions and Tokenize API (#32863)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2026-01-31 04:51:15 -08:00
Michael Goin
29fba76781 [UX] Use gguf repo_id:quant_type syntax for examples and docs (#33371)
Signed-off-by: mgoin <mgoin64@gmail.com>
2026-01-31 12:14:54 +08:00
Isotr0py
9df152bbf6 [Misc] Algin Qwen3-VL-embedding image example outputs with HF repo example (#33419)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2026-01-30 19:36:56 -08:00
Patrick von Platen
10152d2194 [Realtime API] Adds minimal realtime API based on websockets (#33187)
Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Nick Hill <nickhill123@gmail.com>
2026-01-30 18:41:29 +08:00
hujiaxin0
ba45bedfd1 [model] Add support for openPangu7B-VL (#32449)
Signed-off-by: hujiaxin <524446785@qq.com>
Signed-off-by: Emilie1001 <79921183+Emilie1001@users.noreply.github.com>
Co-authored-by: Emilie1001 <79921183+Emilie1001@users.noreply.github.com>
2026-01-30 15:54:27 +08:00
Harry Mellor
9432ed8c7e Explicitly set return_dict for apply_chat_template (#33372)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2026-01-30 07:27:04 +00:00
Ryan Rock
070c811d6f [CI][AMD] Skip 4 GPUs testgroup ray tests (#33305)
Signed-off-by: Ryan Rock <ryan.rock@amd.com>
Co-authored-by: TJian <tunjian.tan@embeddedllm.com>
2026-01-29 21:39:53 -08:00
Wang Haoyu
c46b0cd0af [Model][Multimodal] Add explicit MusicFlamingo adapter (#32696)
Signed-off-by: WangHaoyuuu <mailwhaoyu@gmail.com>
2026-01-30 11:01:29 +08:00
Roger Wang
8b3f0a99dd [Models] Qwen3-ASR (#33312)
Signed-off-by: Roger Wang <hey@rogerw.io>
2026-01-29 19:27:15 +08:00
wang.yuqi
abb34ac43a [Bugfix] Fix Qwen3-VL-Reranker load. (#33298)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Signed-off-by: wang.yuqi <noooop@126.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-01-29 08:42:53 +00:00
ramos
36d450e3b8 Adds FunAudioChat multimodal audio model support (#2) (#33058)
Signed-off-by: ramos <49182011+nemoramo@users.noreply.github.com>
Signed-off-by: mayufeng <mayufeng@example.com>
Co-authored-by: mayufeng <mayufeng@example.com>
2026-01-28 05:18:09 +00:00
Harry Mellor
2eb673a088 Add flake8-implicit-str-concat rules to Ruff (#33191)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2026-01-28 04:56:10 +00:00
Jared Wen
6ee7f18f33 [Logging] add --disable-access-log-for-endpoints CLI option (#30011)
Add a new CLI option --disable-access-log-for-endpoints to suppress
uvicorn access logs for specified endpoints (e.g., /health, /metrics, /ping).

This addresses the need to reduce log noise in production environments
where health check endpoints are frequently polled by load balancers or
monitoring systems, generating excessive log entries that obscure
meaningful request logs.

Fixes #29982

Signed-off-by: JaredforReal <w13431838023@gmail.com>
2026-01-26 21:49:03 +00:00
Yuxuan Zhang
bb17e8f11c [GLM-OCR] GLM-OCR with MTP Support (#33005)
Signed-off-by: zRzRzRzRzRzRzR <2448370773@qq.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2026-01-26 06:24:43 -08:00
Itay Etelis
6ca2c91b96 [Model] Use mm_position to compute mrope positions for Qwen3-Omni (#33010)
Signed-off-by: Itay Etelis <itay.etelis@ibm.com>
Co-authored-by: Itay Etelis <itay.etelis@ibm.com>
2026-01-26 13:48:07 +00:00
ltd0924
b40db4dfec [StepVL] add step vl offline example (#33054)
Signed-off-by: luotingdan <luotingdan@stepfun.com>
Co-authored-by: luotingdan <luotingdan@stepfun.com>
2026-01-26 01:00:32 -08:00
Cyrus Leung
11b556878b [Refactor] Use data parser for matching data items to multi-modal UUIDs (#32955)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2026-01-26 15:00:28 +08:00
Itay Etelis
a698e8e7ad [Model] Use mm_position to compute mrope positions for Qwen2.5-Omni (#32772)
Signed-off-by: Itay Etelis <itay.etelis@ibm.com>
Co-authored-by: Itay Etelis <itay.etelis@ibm.com>
2026-01-25 20:15:53 +08:00
Mark McLoughlin
1cb4341fbc [ROCm][PD] Remove unused moriio connector proxy code (#32939)
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
2026-01-23 15:59:04 +00:00
wang.yuqi
05f3d714db [Frontend][3/n] Make pooling entrypoints request schema consensus | EmbedRequest & ClassifyRequest (#32905)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Signed-off-by: wang.yuqi <noooop@126.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-01-23 12:03:44 +00:00
wang.yuqi
328cbb2773 [Frontend][2/n] Make pooling entrypoints request schema consensus | ChatRequest (#32574)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
2026-01-22 10:32:44 +00:00