Commit Graph

960 Commits

Author SHA1 Message Date
csy0225
8b45c58fe9 [Models] Step-3.5-Flash (#33523)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: i-zhangmingming <i-zhangmingming@stepfun.com>
Co-authored-by: xiewuxun <xiewuxun@stepfun.com>
Co-authored-by: zetaohong <i-hongzetao@stepfun.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
(cherry picked from commit c3b40dc3e7)
2026-02-02 02:16:23 -08:00
Roger Wang
5042815ab6 [Models] Kimi-K2.5 (#33131)
Signed-off-by: wanglinian <wanglinian@stu.pku.edu.cn>
Signed-off-by: wangln19 <96399074+wangln19@users.noreply.github.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Signed-off-by: Roger Wang <hey@rogerw.io>
Co-authored-by: wanglinian <wanglinian@stu.pku.edu.cn>
Co-authored-by: wangln19 <96399074+wangln19@users.noreply.github.com>
Co-authored-by: Zaida Zhou <58739961+zhouzaida@users.noreply.github.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Nick Hill <nickhill123@gmail.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
(cherry picked from commit b539f988e1)
2026-01-28 02:16:28 -08:00
Cyrus Leung
11b556878b [Refactor] Use data parser for matching data items to multi-modal UUIDs (#32955)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2026-01-26 15:00:28 +08:00
JJJYmmm
7e67df5570 [Bugfix] fix encoder cache hang in Qwen3VL (#32684)
Signed-off-by: JJJYmmm <92386084+JJJYmmm@users.noreply.github.com>
Signed-off-by: Roger Wang <hey@rogerw.io>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Roger Wang <hey@rogerw.io>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2026-01-25 05:17:31 +00:00
Patrick von Platen
3f3f89529d [Voxtral] Add new streaming arch (#32861)
Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-01-23 12:41:52 +01:00
Maximilien de Bayser
ff365eea94 Support bge-m3 sparse embeddings and colbert embeddings (#14526)
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Signed-off-by: Max de Bayser <maxdebayser@gmail.com>
2026-01-22 23:52:57 +08:00
Nicolò Lucchesi
ea6102b85d [Bugfix] Fix Whisper/encoder-decoder GPU memory leak (#32789)
Signed-off-by: NickLucche <nlucches@redhat.com>
2026-01-22 10:50:37 +00:00
Andreas Karatzas
eb1629da24 [ROCm][CI] Fix AITER test flakiness by using explicit attention backend (#32346)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
Signed-off-by: Matthew Wong <Matthew.Wong2@amd.com>
Co-authored-by: Matthew Wong <Matthew.Wong2@amd.com>
2026-01-22 13:55:25 +08:00
Huy Do
f5fdec8ce2 Upgrade transformers-4.57.5 (#32287)
Signed-off-by: Huy Do <huydhn@gmail.com>
2026-01-22 05:19:19 +00:00
Kim Hee Su
7727ce35c2 [Model] Add Eagle2.5-8B Vision-Language Model support (#32456)
Signed-off-by: kimheesu <wlskaka4@gmail.com>
2026-01-21 09:39:53 +00:00
Alex Brooks
27b81e010d [Bugfix] Fix Granite Vision / Don't use Siglip Pooling Head Nested Models by Default (#32299)
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
2026-01-21 11:11:52 +08:00
Lucas Wilkinson
2261340806 [Misc] Remove pad_for_cudagraphs from config (#30143)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Co-authored-by: Matthew Bonanni <mbonanni@redhat.com>
2026-01-20 15:05:48 -05:00
wang.yuqi
c88860d759 [Frontend] Score entrypoint support data_1 & data_2 and queries & documents as inputs (#32577)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
2026-01-19 14:07:46 +00:00
Yuxuan Zhang
71832ba71e [GLM-4.7] GLM Model support for GLM-Lite (#31386)
Signed-off-by: zRzRzRzRzRzRzR <2448370773@qq.com>
Signed-off-by: Yuxuan Zhang <2448370773@qq.com>
2026-01-19 01:18:38 -08:00
Li Xie
c826c72a96 [Model] Support Step1 Model (#32511)
Signed-off-by: xieli <xieli@stepfun.com>
2026-01-18 10:20:46 +00:00
Isotr0py
8cc26acd8b [Performance] Improve Triton prefill attention kernel's performance (#32403)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2026-01-17 20:19:59 -08:00
wang.yuqi
4ae77dfd42 [Frontend][1/n] Make pooling entrypoints request schema consensus | CompletionRequest (#32395)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
2026-01-16 06:17:04 +00:00
ltd0924
709502558c [Model] Add Step3vl 10b (#32329)
Signed-off-by: luotingdan <luotingdan@stepfun.com>
Signed-off-by: ltd0924 <32387785+ltd0924@users.noreply.github.com>
Co-authored-by: luotingdan <luotingdan@stepfun.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
2026-01-15 19:04:16 -08:00
Cyrus Leung
90db5b31e4 [Refactor] Move top-level dummy data generation to registry (#32310)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2026-01-14 02:17:46 -08:00
sangho.lee
7e6f123810 Add Molmo2 multimodal model support (#30997)
Signed-off-by: sanghol <sanghol@allenai.org>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2026-01-14 15:33:09 +08:00
Andreas Karatzas
9d0d7f48d5 [ROCm][CI] Handle missing vision_config in Isaac model attention patch (#32281)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-01-14 07:21:26 +00:00
Roberto L. Castro
8ef50d9a6b [Kernel][Performance] Enable smaller Scaling Factor tiling for NVFP4 small-batch decoding (#30885)
Signed-off-by: LopezCastroRoberto <roberto.lopez.castro@udc.es>
Signed-off-by: Roberto L. Castro <38211239+LopezCastroRoberto@users.noreply.github.com>
Signed-off-by: LopezCastroRoberto <rocastro@redhat.com>
2026-01-13 15:22:53 -08:00
Wentao Ye
f28125d87b [Perf] Optimize grouped topk kernel, 1.2%~2% E2E Throughput improvement (#32058)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2026-01-13 10:58:18 -08:00
Cyrus Leung
252c011012 [Refactor] Remove MultiModalProfiler (#32254)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2026-01-13 15:10:20 +00:00
Cyrus Leung
232214b2ae [Bugfix] Replace PoolingParams.normalize with use_activation (#32243)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2026-01-13 10:45:42 +00:00
Andreas Karatzas
5e714f7ff4 [ROCm][CI] Fix HuggingFace flash_attention_2 accuracy issue in Isaac vision encoder (#32233)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-01-12 22:33:59 -08:00
Jaehyun An
6bc9c8473e [MODEL] New model support for kakaocorp/kanana-1.5-v-3b-instruct (#29384)
Signed-off-by: Jaehyun An <steve.ai@kakaocorp.com>
2026-01-12 16:39:02 +00:00
Kyungmin Lee
63ed2409e8 Add K-EXAONE-236B-A23B (#31621)
Signed-off-by: lkm2835 <lkm2835@gmail.com>
Signed-off-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: lgai-exaone <exaonemodels@lgresearch.ai>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2026-01-12 16:30:50 +00:00
Jeremy Teboul
07286ec5a6 [Bugfix] Fix integer overflow in Gemma3n audio processing (#31657)
Signed-off-by: Jeremy Teboul <jeremyte@meta.com>
2026-01-10 17:52:53 +08:00
Cyrus Leung
583a90e005 [Refactor] Separate sequence and token pooling types (#32026)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2026-01-10 04:53:24 +00:00
Akshat Shrivastava
e45946bd91 feature/issac 0.2 (#31550)
Signed-off-by: Roger Wang <hey@rogerw.io>
Co-authored-by: Roger Wang <hey@rogerw.io>
2026-01-10 03:18:05 +00:00
Matthew Bonanni
2612ba9285 [1/N][Attention] Restructure attention: move files (#31916)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
2026-01-09 13:10:24 -08:00
Jeremy Teboul
657e9c0e18 [Fix] Introduce audio channels spec (#31595)
Signed-off-by: Jeremy Teboul <jeremyte@meta.com>
2026-01-09 19:34:51 +00:00
Divakar Verma
a1648c4045 [ROCm][CI] Fix test_token_classification.py::test_bert_models (#31993)
Signed-off-by: Divakar Verma <divakar.verma@amd.com>
2026-01-09 04:04:33 +00:00
Robert Shaw
5825bbc1f7 [Quantization] Deprecate Long Tail of Schemes (#31688)
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Signed-off-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
2026-01-08 19:07:45 -05:00
Lucas Wilkinson
6cdf015c3c [Misc] Fix Current vLLM config is not set. warnings, assert to avoid issues in the future (#31747)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
2026-01-08 15:20:49 -08:00
yxing-bj
fe86be66c5 [Model] Support IQuestCoder model (#31575)
Signed-off-by: yxing <yxing@iquestlab.com>
2026-01-08 14:42:57 +00:00
tianshu-Michael-yu
03fd76c570 [Model] Add LFM2-VL model support (#31758)
Signed-off-by: Tianshu Yu <tianshuyu.formal@gmail.com>
Signed-off-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2026-01-08 05:00:27 -08:00
Bijaya Dangol
59d260f5e4 [Model] Add Grok-2 (#31847)
Signed-off-by: dangoldbj <dangoldbj23@gmail.com>
2026-01-08 04:59:48 -08:00
Isotr0py
eac3b96ec0 [Models] Allow converting Qwen3-VL into Reranker model (#31890)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2026-01-08 08:10:15 +00:00
Andreas Karatzas
8dd2419fa9 [CI] Skip Qwen-VL in multimodal processing tests due to flaky external dependency (#31932)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-01-08 02:58:01 +00:00
Andreas Karatzas
2a42ae790d [ROCm][CI] Fix ModernBERT token classification test numerical accuracy on ROCm (#31820)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-01-06 23:21:15 +00:00
wang.yuqi
43d384bab4 [CI] Increase the MTEB_EMBED_TOL threshold to 5e-4. (#31797)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
2026-01-06 19:30:05 +08:00
Isotr0py
ee2e69d6cd [Bugfix][CI/Build] Fix failing pooling models test due to Triton kernel accuracy diff (#31776)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2026-01-06 00:44:22 -08:00
amitz-nv
ee21291825 [Model] Nemotron Parse 1.1 Support (#30864)
Signed-off-by: amitz-nv <203509407+amitz-nv@users.noreply.github.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
2026-01-05 13:00:14 -08:00
Isotr0py
51e38a8e30 [Misc] Enable Paligemma's PrefixLM attention mask computation (#31725)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2026-01-06 03:31:49 +08:00
Isotr0py
6aa5b18e1d [v1] Add encoder-only/cross attention support to Triton Attention backend (#31406)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2026-01-06 00:00:23 +08:00
wang.yuqi
911d38ed99 [Model] Let more models to support the score template. (#31335)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Signed-off-by: wang.yuqi <noooop@126.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2026-01-05 11:54:26 +00:00
wangxiyuan
bb4337b34c [Platform] Deprecate seed_everything (#31659)
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2026-01-04 18:34:04 -08:00
Isotr0py
367856de14 [CI/Build] Revive skipped reward models e2e test (#31665)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2026-01-05 02:33:46 +00:00