csy0225
8b45c58fe9
[Models] Step-3.5-Flash ( #33523 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
Co-authored-by: i-zhangmingming <i-zhangmingming@stepfun.com >
Co-authored-by: xiewuxun <xiewuxun@stepfun.com >
Co-authored-by: zetaohong <i-hongzetao@stepfun.com >
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com >
(cherry picked from commit c3b40dc3e7 )
2026-02-02 02:16:23 -08:00
Or Ozeri
fe18ce4d3f
Revert "Enable Cross layers KV cache layout at NIXL Connector ( #30207 )" ( #33241 )
...
Signed-off-by: Or Ozeri <oro@il.ibm.com >
Co-authored-by: Kevin H. Luu <khluu000@gmail.com >
(cherry picked from commit 2e8de86777 )
2026-01-28 11:44:59 -08:00
Roger Wang
5042815ab6
[Models] Kimi-K2.5 ( #33131 )
...
Signed-off-by: wanglinian <wanglinian@stu.pku.edu.cn >
Signed-off-by: wangln19 <96399074+wangln19@users.noreply.github.com >
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Signed-off-by: youkaichao <youkaichao@gmail.com >
Signed-off-by: Roger Wang <hey@rogerw.io >
Co-authored-by: wanglinian <wanglinian@stu.pku.edu.cn >
Co-authored-by: wangln19 <96399074+wangln19@users.noreply.github.com >
Co-authored-by: Zaida Zhou <58739961+zhouzaida@users.noreply.github.com >
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Co-authored-by: Nick Hill <nickhill123@gmail.com >
Co-authored-by: youkaichao <youkaichao@gmail.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
(cherry picked from commit b539f988e1 )
2026-01-28 02:16:28 -08:00
Cyrus Leung
11b556878b
[Refactor] Use data parser for matching data items to multi-modal UUIDs ( #32955 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-01-26 15:00:28 +08:00
zhanqiuhu
151e5451c2
[Doc] Add Qwen2.5 models to batch invariance tested models ( #33016 )
...
Signed-off-by: Zhanqiu Hu <zh338@cornell.edu >
2026-01-25 09:20:46 +00:00
7. Sun
ff6c1da4e6
[Docs] Fix Apple silicon include path in CPU installation docs ( #32977 )
...
Signed-off-by: 7. Sun <jhao.sun@gmail.com >
2026-01-25 01:51:49 +00:00
TJian
1ebdff412a
[DOC] [ROCm] Update doc for v0.14.1 ( #32998 )
...
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com >
2026-01-25 09:13:21 +08:00
Maryam Tahhan
203d0bc0c2
[CPU] Improve CPU Docker build ( #30953 )
...
Signed-off-by: Maryam Tahhan <mtahhan@redhat.com >
Co-authored-by: Li, Jiang <jiang1.li@intel.com >
2026-01-24 17:08:24 +00:00
Louie Tsai
719ac592ed
Update CPU doc according to feedback ( #32963 )
...
Signed-off-by: Tsai, Louie <louie.tsai@intel.com >
Signed-off-by: Louie Tsai <louie.tsai@intel.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-01-24 16:02:44 +00:00
david guan
bc0d291bfe
feat: Complete LoRA support for MiniMaxM2 Fixes #32736 ( #32763 )
...
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com >
2026-01-24 20:48:46 +08:00
Roy Wang
5c86a89805
[docs] Update governance process links ( #32995 )
...
Signed-off-by: esmeetu <jasonailu87@gmail.com >
Signed-off-by: Roger Wang <hey@rogerw.io >
Co-authored-by: Roger Wang <hey@rogerw.io >
2026-01-23 23:32:44 -08:00
Michael Goin
d0cbac5827
[Dev UX] Add auto-detection for VLLM_PRECOMPILED_WHEEL_VARIANT during install ( #32948 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
Signed-off-by: Michael Goin <mgoin64@gmail.com >
Co-authored-by: Shengqi Chen <i@harrychen.xyz >
2026-01-23 19:15:17 -08:00
ruizcrp
c0d820457a
Auth_token added in documentation as it is required ( #32988 )
...
Signed-off-by: Michael Goin <mgoin64@gmail.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
2026-01-24 03:03:05 +00:00
Orion Reblitz-Richardson
68b0a6c1ba
[CI][torch nightlies] Use main Dockerfile with flags for nightly torch tests ( #30443 )
...
Signed-off-by: Orion Reblitz-Richardson <orionr@meta.com >
Signed-off-by: Orion Reblitz-Richardson <orionr@gmail.com >
Co-authored-by: Kevin H. Luu <khluu000@gmail.com >
2026-01-23 10:22:56 -08:00
sangbumlikeagod
9b77bb790d
[Frontend] add logprob, compression_rate to 'verbose_json' features ( #31059 )
...
Signed-off-by: sangbumlikeagod <oironese@naver.com >
Signed-off-by: sangbumlikeagod <98077576+sangbumlikeagod@users.noreply.github.com >
2026-01-23 16:35:13 +00:00
wang.yuqi
05f3d714db
[Frontend][3/n] Make pooling entrypoints request schema consensus | EmbedRequest & ClassifyRequest ( #32905 )
...
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io >
Signed-off-by: wang.yuqi <noooop@126.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-01-23 12:03:44 +00:00
Eldar Kurtić
44f08af3a7
Add llmcompressor fp8 kv-cache quant (per-tensor and per-attn_head) ( #30141 )
...
Signed-off-by: Eldar Kurtic <8884008+eldarkurtic@users.noreply.github.com >
Signed-off-by: eldarkurtic <8884008+eldarkurtic@users.noreply.github.com >
2026-01-22 13:29:57 -07:00
Maximilien de Bayser
ff365eea94
Support bge-m3 sparse embeddings and colbert embeddings ( #14526 )
...
Signed-off-by: Max de Bayser <mbayser@br.ibm.com >
Signed-off-by: Max de Bayser <maxdebayser@gmail.com >
2026-01-22 23:52:57 +08:00
Cyrus Leung
d117a4d1a9
[Frontend] Introduce Renderer for processing chat messages (using ModelConfig) ( #30200 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-01-22 12:44:22 +00:00
wang.yuqi
328cbb2773
[Frontend][2/n] Make pooling entrypoints request schema consensus | ChatRequest ( #32574 )
...
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io >
2026-01-22 10:32:44 +00:00
liranschour
64e3d67ac0
Enable Cross layers KV cache layout at NIXL Connector ( #30207 )
...
Signed-off-by: Liran Schour <lirans@il.ibm.com >
Signed-off-by: liranschour <liranschour@users.noreply.github.com >
Co-authored-by: Or Ozeri <or@ozery.com >
2026-01-22 10:12:58 +00:00
whx
1861ae8aae
[PluggableLayer][1/N] Define PluggableLayer (Fix ci) ( #32744 )
...
Signed-off-by: whx-sjtu <2952154980@qq.com >
2026-01-21 11:38:04 -05:00
Robert Shaw
42135d6898
[MoE Refactor] Oracle Select FP8+NVFP4 Kernels In Priority ( #32414 )
2026-01-21 08:22:33 -05:00
Kim Hee Su
7727ce35c2
[Model] Add Eagle2.5-8B Vision-Language Model support ( #32456 )
...
Signed-off-by: kimheesu <wlskaka4@gmail.com >
2026-01-21 09:39:53 +00:00
Lucas Kabela
c80f92c14d
[Documentation] Fix typo in docs/design/torch_compile_multimodal.md ( #32741 )
...
Signed-off-by: Lucas Kabela <lucaskabela@meta.com >
2026-01-20 23:54:20 -08:00
Paco Xu
360aa93f8f
[Docs] Fix GitHub handle in governance process ( #32582 )
...
Signed-off-by: Paco Xu <paco.xu@daocloud.io >
2026-01-21 07:07:50 +00:00
Robert Shaw
c78ee240b3
Revert "[PluggableLayer][1/N] Define PluggableLayer" ( #32725 )
2026-01-21 00:21:06 +00:00
Cyrus Leung
09194b90a5
[Doc] Update docs for MM model development with context usage ( #32691 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-01-20 10:37:35 -08:00
TJian
c025263ddd
[Doc] [ROCm] Update ROCm getting started doc ( #32580 )
...
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com >
Co-authored-by: Hongxia Yang <hongxia.yang@amd.com >
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-01-20 09:20:08 -08:00
whx
4ca62a0dbd
[PluggableLayer][1/N] Define PluggableLayer ( #32331 )
...
Signed-off-by: whx-sjtu <2952154980@qq.com >
2026-01-20 16:19:21 +00:00
杨朱 · Kiki
bb9172030e
[Metrics] Complete removal of deprecated vllm:time_per_output_token_seconds metric ( #32661 )
...
This PR completes the removal of the deprecated vllm:time_per_output_token_seconds
metric that was deprecated in v0.11, hidden in v0.12, scheduled for removal in v0.13,
but delayed until v0.15.
Signed-off-by: carlory <baofa.fan@daocloud.io >
Co-authored-by: Claude Haiku 4.5 <noreply@anthropic.com >
2026-01-20 12:28:41 +00:00
Jackmin801
12dab78f49
[Feat] allow inplace loading lora ( #31326 )
...
Signed-off-by: Jackmin801 <ongjackm@gmail.com >
Signed-off-by: Jackmin801 <56836461+Jackmin801@users.noreply.github.com >
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com >
2026-01-20 10:15:20 +08:00
lon
73f2a81c75
docs: prefix caching seems quite outdated ( #28784 )
...
Signed-off-by: lon <114724657+longregen@users.noreply.github.com >
Signed-off-by: Russell Bryant <russell.bryant@gmail.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Russell Bryant <russell.bryant@gmail.com >
2026-01-19 11:49:52 -08:00
wang.yuqi
c88860d759
[Frontend] Score entrypoint support data_1 & data_2 and queries & documents as inputs ( #32577 )
...
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io >
2026-01-19 14:07:46 +00:00
Yuxuan Zhang
71832ba71e
[GLM-4.7] GLM Model support for GLM-Lite ( #31386 )
...
Signed-off-by: zRzRzRzRzRzRzR <2448370773@qq.com >
Signed-off-by: Yuxuan Zhang <2448370773@qq.com >
2026-01-19 01:18:38 -08:00
Li Xie
c826c72a96
[Model] Support Step1 Model ( #32511 )
...
Signed-off-by: xieli <xieli@stepfun.com >
2026-01-18 10:20:46 +00:00
Robert Shaw
4a6af8813f
[MoE Refactor] Move Test Impl into Test Dirs ( #32129 )
...
Signed-off-by: Robert Shaw <rshaw@neuralmagic.com >
Co-authored-by: Robert Shaw <rshaw@neuralmagic.com >
2026-01-18 12:16:59 +08:00
Simon Mo
5a3050a089
[Docs][Governance] Add @robertshaw2-redhat to lead maintainers group ( #32498 )
...
Co-authored-by: Claude <noreply@anthropic.com >
2026-01-16 18:35:49 -08:00
wang.yuqi
4ae77dfd42
[Frontend][1/n] Make pooling entrypoints request schema consensus | CompletionRequest ( #32395 )
...
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io >
2026-01-16 06:17:04 +00:00
ltd0924
709502558c
[Model] Add Step3vl 10b ( #32329 )
...
Signed-off-by: luotingdan <luotingdan@stepfun.com >
Signed-off-by: ltd0924 <32387785+ltd0924@users.noreply.github.com >
Co-authored-by: luotingdan <luotingdan@stepfun.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
Co-authored-by: Roger Wang <hey@rogerw.io >
2026-01-15 19:04:16 -08:00
rongfu.leng
3a4e10c847
[Benchmark] [Feature] add vllm bench sweep startup command ( #32337 )
...
Signed-off-by: lengrongfu <lenronfu@gmail.com >
2026-01-15 09:25:46 +00:00
Cyrus Leung
9ea07b41da
[1/N] Reorganize multimodal processing code ( #32327 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-01-14 15:25:31 +00:00
sangho.lee
7e6f123810
Add Molmo2 multimodal model support ( #30997 )
...
Signed-off-by: sanghol <sanghol@allenai.org >
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2026-01-14 15:33:09 +08:00
Michael Goin
6388b50058
[Docs] Add docs about OOT Quantization Plugins ( #32035 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2026-01-14 15:25:45 +08:00
Yi Liu
50632adc58
Consolidate Intel Quantization Toolkit Integration in vLLM ( #31716 )
...
Signed-off-by: yiliu30 <yi4.liu@intel.com >
2026-01-14 07:11:30 +00:00
Dmitry Tokarev
46f8c6b725
Fix CUDA 13 wheel installation doc ( #32276 )
...
Signed-off-by: Dmitry Tokarev <dtokarev@nvidia.com >
2026-01-13 10:48:37 -08:00
Nicolò Lucchesi
8c8653b672
[Docs] Nixl Usage recommend fail kv_load_failure_policy ( #32198 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2026-01-13 12:51:57 +00:00
Roy Wang
44c34f22d9
[Doc] Update installation from source command ( #32239 )
...
Signed-off-by: esmeetu <jasonailu87@gmail.com >
2026-01-12 23:10:27 -08:00
Andrew Bennett
f243abc92d
Fix various typos found in docs ( #32212 )
...
Signed-off-by: Andrew Bennett <potatosaladx@meta.com >
2026-01-13 03:41:47 +00:00
Matthew Bonanni
20228cb851
[3/N][Attention] Move AttentionMetadata-related code from utils.py to backend.py ( #32054 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
2026-01-12 09:13:56 -08:00