Commit Graph

1975 Commits

Author SHA1 Message Date
Kim Hee Su
7727ce35c2 [Model] Add Eagle2.5-8B Vision-Language Model support (#32456)
Signed-off-by: kimheesu <wlskaka4@gmail.com>
2026-01-21 09:39:53 +00:00
Lucas Kabela
c80f92c14d [Documentation] Fix typo in docs/design/torch_compile_multimodal.md (#32741)
Signed-off-by: Lucas Kabela <lucaskabela@meta.com>
2026-01-20 23:54:20 -08:00
Paco Xu
360aa93f8f [Docs] Fix GitHub handle in governance process (#32582)
Signed-off-by: Paco Xu <paco.xu@daocloud.io>
2026-01-21 07:07:50 +00:00
Robert Shaw
c78ee240b3 Revert "[PluggableLayer][1/N] Define PluggableLayer" (#32725) 2026-01-21 00:21:06 +00:00
Cyrus Leung
09194b90a5 [Doc] Update docs for MM model development with context usage (#32691)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2026-01-20 10:37:35 -08:00
TJian
c025263ddd [Doc] [ROCm] Update ROCm getting started doc (#32580)
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Co-authored-by: Hongxia Yang <hongxia.yang@amd.com>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
2026-01-20 09:20:08 -08:00
whx
4ca62a0dbd [PluggableLayer][1/N] Define PluggableLayer (#32331)
Signed-off-by: whx-sjtu <2952154980@qq.com>
2026-01-20 16:19:21 +00:00
杨朱 · Kiki
bb9172030e [Metrics] Complete removal of deprecated vllm:time_per_output_token_seconds metric (#32661)
This PR completes the removal of the deprecated vllm:time_per_output_token_seconds
metric that was deprecated in v0.11, hidden in v0.12, scheduled for removal in v0.13,
but delayed until v0.15.

Signed-off-by: carlory <baofa.fan@daocloud.io>
Co-authored-by: Claude Haiku 4.5 <noreply@anthropic.com>
2026-01-20 12:28:41 +00:00
Jackmin801
12dab78f49 [Feat] allow inplace loading lora (#31326)
Signed-off-by: Jackmin801 <ongjackm@gmail.com>
Signed-off-by: Jackmin801 <56836461+Jackmin801@users.noreply.github.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
2026-01-20 10:15:20 +08:00
lon
73f2a81c75 docs: prefix caching seems quite outdated (#28784)
Signed-off-by: lon <114724657+longregen@users.noreply.github.com>
Signed-off-by: Russell Bryant <russell.bryant@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Russell Bryant <russell.bryant@gmail.com>
2026-01-19 11:49:52 -08:00
wang.yuqi
c88860d759 [Frontend] Score entrypoint support data_1 & data_2 and queries & documents as inputs (#32577)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
2026-01-19 14:07:46 +00:00
Yuxuan Zhang
71832ba71e [GLM-4.7] GLM Model support for GLM-Lite (#31386)
Signed-off-by: zRzRzRzRzRzRzR <2448370773@qq.com>
Signed-off-by: Yuxuan Zhang <2448370773@qq.com>
2026-01-19 01:18:38 -08:00
Li Xie
c826c72a96 [Model] Support Step1 Model (#32511)
Signed-off-by: xieli <xieli@stepfun.com>
2026-01-18 10:20:46 +00:00
Robert Shaw
4a6af8813f [MoE Refactor] Move Test Impl into Test Dirs (#32129)
Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>
Co-authored-by: Robert Shaw <rshaw@neuralmagic.com>
2026-01-18 12:16:59 +08:00
Simon Mo
5a3050a089 [Docs][Governance] Add @robertshaw2-redhat to lead maintainers group (#32498)
Co-authored-by: Claude <noreply@anthropic.com>
2026-01-16 18:35:49 -08:00
wang.yuqi
4ae77dfd42 [Frontend][1/n] Make pooling entrypoints request schema consensus | CompletionRequest (#32395)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
2026-01-16 06:17:04 +00:00
ltd0924
709502558c [Model] Add Step3vl 10b (#32329)
Signed-off-by: luotingdan <luotingdan@stepfun.com>
Signed-off-by: ltd0924 <32387785+ltd0924@users.noreply.github.com>
Co-authored-by: luotingdan <luotingdan@stepfun.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
2026-01-15 19:04:16 -08:00
rongfu.leng
3a4e10c847 [Benchmark] [Feature] add vllm bench sweep startup command (#32337)
Signed-off-by: lengrongfu <lenronfu@gmail.com>
2026-01-15 09:25:46 +00:00
Cyrus Leung
9ea07b41da [1/N] Reorganize multimodal processing code (#32327)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2026-01-14 15:25:31 +00:00
sangho.lee
7e6f123810 Add Molmo2 multimodal model support (#30997)
Signed-off-by: sanghol <sanghol@allenai.org>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2026-01-14 15:33:09 +08:00
Michael Goin
6388b50058 [Docs] Add docs about OOT Quantization Plugins (#32035)
Signed-off-by: mgoin <mgoin64@gmail.com>
2026-01-14 15:25:45 +08:00
Yi Liu
50632adc58 Consolidate Intel Quantization Toolkit Integration in vLLM (#31716)
Signed-off-by: yiliu30 <yi4.liu@intel.com>
2026-01-14 07:11:30 +00:00
Dmitry Tokarev
46f8c6b725 Fix CUDA 13 wheel installation doc (#32276)
Signed-off-by: Dmitry Tokarev <dtokarev@nvidia.com>
2026-01-13 10:48:37 -08:00
Nicolò Lucchesi
8c8653b672 [Docs] Nixl Usage recommend fail kv_load_failure_policy (#32198)
Signed-off-by: NickLucche <nlucches@redhat.com>
2026-01-13 12:51:57 +00:00
Roy Wang
44c34f22d9 [Doc] Update installation from source command (#32239)
Signed-off-by: esmeetu <jasonailu87@gmail.com>
2026-01-12 23:10:27 -08:00
Andrew Bennett
f243abc92d Fix various typos found in docs (#32212)
Signed-off-by: Andrew Bennett <potatosaladx@meta.com>
2026-01-13 03:41:47 +00:00
Matthew Bonanni
20228cb851 [3/N][Attention] Move AttentionMetadata-related code from utils.py to backend.py (#32054)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
2026-01-12 09:13:56 -08:00
Jaehyun An
6bc9c8473e [MODEL] New model support for kakaocorp/kanana-1.5-v-3b-instruct (#29384)
Signed-off-by: Jaehyun An <steve.ai@kakaocorp.com>
2026-01-12 16:39:02 +00:00
Kyungmin Lee
63ed2409e8 Add K-EXAONE-236B-A23B (#31621)
Signed-off-by: lkm2835 <lkm2835@gmail.com>
Signed-off-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: lgai-exaone <exaonemodels@lgresearch.ai>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2026-01-12 16:30:50 +00:00
Andy Zhang
95e53d907c doc: Update model references in supported_models.md (#32188) 2026-01-12 08:15:28 -08:00
Andy Zhang
e68b0dad8b doc: Update model name for Qwen3-Coder in documentation (#32185)
Signed-off-by: Andy Zhang <xiazhang@microsoft.com>
2026-01-12 07:10:50 -08:00
Or Ozeri
9cddbdba6d OffloadingConnector: Add cpu_bytes_to_use configuration (#24498)
Signed-off-by: Or Ozeri <oro@il.ibm.com>
2026-01-12 15:00:43 +00:00
RickyChen / 陳昭儒
a5f89ae296 [Doc] Add documentation for offline API docs feature (#32134)
Signed-off-by: rickychen-infinirc <ricky.chen@infinirc.com>
2026-01-12 10:33:48 +00:00
Jee Jee Li
05e8981234 [Doc] Improve LoRA docs (#32159)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Signed-off-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-01-12 02:19:17 -08:00
XlKsyt
899541bdb1 [doc] fix broken links (#32158)
Signed-off-by: minimAluminiumalism <caixuesen@outlook.com>
2026-01-12 10:18:38 +00:00
wang.yuqi
60446cd684 [Model] Improve multimodal pooling examples (#32085)
Signed-off-by: wang.yuqi <noooop@126.com>
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
2026-01-12 07:54:09 +00:00
Cyrus Leung
ef96fa3f1f [Benchmark][2/2] Use spline interpolation to tune SLA variables (#32095)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2026-01-10 20:27:27 -08:00
Matthew Bonanni
2612ba9285 [1/N][Attention] Restructure attention: move files (#31916)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
2026-01-09 13:10:24 -08:00
Jeremy Teboul
657e9c0e18 [Fix] Introduce audio channels spec (#31595)
Signed-off-by: Jeremy Teboul <jeremyte@meta.com>
2026-01-09 19:34:51 +00:00
Shanshan Shen
08d954f036 [Doc] Add developer guide for CustomOp (#30886)
Signed-off-by: shen-shanshan <467638484@qq.com>
2026-01-09 16:21:11 +00:00
inkcherry
4505849b30 [ROCm][PD] add moriio kv connector. (#29304)
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
2026-01-09 14:01:57 +00:00
Lucas Kabela
f16bfbe5bc [Documentation][torch.compile] Add documentation for torch.compile + multimodal encoders (#31627)
Signed-off-by: Lucas Kabela <lucaskabela@meta.com>
2026-01-08 14:33:24 -05:00
Jee Jee Li
49568d5cf9 [Doc] Improve MM models LoRA notes (#31979)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2026-01-08 08:55:22 -08:00
yxing-bj
fe86be66c5 [Model] Support IQuestCoder model (#31575)
Signed-off-by: yxing <yxing@iquestlab.com>
2026-01-08 14:42:57 +00:00
Chauncey
1da3a5441a [Docs]: update claude code url (#31971)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
2026-01-08 14:04:55 +00:00
Ce Zhao
1123a87892 [Model] Enable LoRA support for Pixtral (#31724)
Signed-off-by: <>
Signed-off-by: 赵策 <alcor@zhaocedeMacBook-Air.local>
Signed-off-by: 赵策 <alcor@mac.mynetworksettings.com>
Co-authored-by: 赵策 <alcor@mac.mynetworksettings.com>
2026-01-08 05:00:57 -08:00
tianshu-Michael-yu
03fd76c570 [Model] Add LFM2-VL model support (#31758)
Signed-off-by: Tianshu Yu <tianshuyu.formal@gmail.com>
Signed-off-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2026-01-08 05:00:27 -08:00
Bijaya Dangol
59d260f5e4 [Model] Add Grok-2 (#31847)
Signed-off-by: dangoldbj <dangoldbj23@gmail.com>
2026-01-08 04:59:48 -08:00
Isotr0py
eac3b96ec0 [Models] Allow converting Qwen3-VL into Reranker model (#31890)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2026-01-08 08:10:15 +00:00
ShaanveerS
9572f74f15 [Model] Enable LoRA support for tower and connector in DotsOCR (#31825)
Signed-off-by: ShaanveerS <shaanver.singh@gmail.com>
2026-01-08 14:50:16 +08:00