Commit Graph

2088 Commits

Author SHA1 Message Date
Cyrus Leung
9ea07b41da [1/N] Reorganize multimodal processing code (#32327)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2026-01-14 15:25:31 +00:00
Roger Wang
b8199f6049 [Model] Re-implement Qwen3Omni Audio Encoder (#32167)
Signed-off-by: Roger Wang <hey@rogerw.io>
2026-01-14 15:40:30 +08:00
sangho.lee
7e6f123810 Add Molmo2 multimodal model support (#30997)
Signed-off-by: sanghol <sanghol@allenai.org>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2026-01-14 15:33:09 +08:00
Matthew Bonanni
98f60e5acb [6/N][Attention] Move utils to more appropriate locations (#32215)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
2026-01-13 05:38:52 -08:00
Cyrus Leung
232214b2ae [Bugfix] Replace PoolingParams.normalize with use_activation (#32243)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2026-01-13 10:45:42 +00:00
Cyrus Leung
eb28e8068d [Refactor] Remove get_encoder_dummy_data (#32241)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2026-01-13 09:21:23 +00:00
YunzhuLu
542a4059b2 [Model] Use mm_position to compute mrope positions for Qwen2-VL/2.5-VL (#32126)
Signed-off-by: YunzhuLu <lucia.yunzhu@gmail.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2026-01-13 09:04:29 +00:00
Andreas Karatzas
11b6af5280 [ROCm][Bugfix] Fix Mamba batched decode producing incorrect output (#32099)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-01-13 05:46:53 +00:00
Cyrus Leung
78d13ea9de [Model] Handle trust_remote_code for transformers backend (#32194)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2026-01-13 09:30:12 +08:00
Vadim Gimpelson
9f430c94bd [BUGFIX] Add missed remaping of the names of fp8 kv-scale (#32199)
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>
2026-01-12 20:42:06 +00:00
Roger Wang
16abe6b85a [Misc] Set default torch num threads for input processing (#31879)
Signed-off-by: Roger Wang <hey@rogerw.io>
2026-01-12 10:28:16 -08:00
Kyungmin Lee
3d962d72ab [BugFix] fix FusedMoE.make_expert_params_mapping in EXAONE-MoE (#32196)
Signed-off-by: lkm2835 <lkm2835@gmail.com>
2026-01-12 10:00:45 -08:00
Matthew Bonanni
20228cb851 [3/N][Attention] Move AttentionMetadata-related code from utils.py to backend.py (#32054)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
2026-01-12 09:13:56 -08:00
Cyrus Leung
8863c2b25c [Model] Standardize pooling heads (#32148)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2026-01-12 17:01:49 +00:00
Jaehyun An
6bc9c8473e [MODEL] New model support for kakaocorp/kanana-1.5-v-3b-instruct (#29384)
Signed-off-by: Jaehyun An <steve.ai@kakaocorp.com>
2026-01-12 16:39:02 +00:00
Kyungmin Lee
63ed2409e8 Add K-EXAONE-236B-A23B (#31621)
Signed-off-by: lkm2835 <lkm2835@gmail.com>
Signed-off-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: lgai-exaone <exaonemodels@lgresearch.ai>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2026-01-12 16:30:50 +00:00
Cyrus Leung
600aaab8d6 [Model] Remove incorrect SupportsPP from MTP models (#32150)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2026-01-12 01:19:30 -08:00
Cyrus Leung
9101dc756c [Model] Avoid hardcoding pooling type (#32119)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2026-01-11 21:28:12 -08:00
Andy Liu
0dd63639be [MTP][GLM][Bugfix] Fixed .weight_scale loading logic that dropped MTP prediction accuracy with fp8+mtp (#32101)
Signed-off-by: Andy Liu <andyliu@roblox.com>
2026-01-10 23:14:54 -08:00
RickyChen / 陳昭儒
8020a60402 [Bugfix] Fix Qwen3-VL-Reranker model loading for sequence classification (#32089)
Signed-off-by: rickychen-infinirc <ricky.chen@infinirc.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2026-01-10 12:40:09 -08:00
Jeremy Teboul
07286ec5a6 [Bugfix] Fix integer overflow in Gemma3n audio processing (#31657)
Signed-off-by: Jeremy Teboul <jeremyte@meta.com>
2026-01-10 17:52:53 +08:00
Cyrus Leung
583a90e005 [Refactor] Separate sequence and token pooling types (#32026)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2026-01-10 04:53:24 +00:00
Akshat Shrivastava
e45946bd91 feature/issac 0.2 (#31550)
Signed-off-by: Roger Wang <hey@rogerw.io>
Co-authored-by: Roger Wang <hey@rogerw.io>
2026-01-10 03:18:05 +00:00
Lucas Kabela
ea6d067a2a [Misc][LLaMa4] Compile LLaMa Vision Encoder (#30709)
Signed-off-by: Lucas Kabela <lucaskabela@meta.com>
2026-01-09 22:01:38 -05:00
Matthew Bonanni
2612ba9285 [1/N][Attention] Restructure attention: move files (#31916)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
2026-01-09 13:10:24 -08:00
Jeremy Teboul
657e9c0e18 [Fix] Introduce audio channels spec (#31595)
Signed-off-by: Jeremy Teboul <jeremyte@meta.com>
2026-01-09 19:34:51 +00:00
Shanshan Shen
08d954f036 [Doc] Add developer guide for CustomOp (#30886)
Signed-off-by: shen-shanshan <467638484@qq.com>
2026-01-09 16:21:11 +00:00
maang
7cdf7e2fe0 [Model] Remove redundant None check in DeepSeekOCR image input processing (#32016)
Signed-off-by: maang <maang_h@163.com>
2026-01-09 06:12:44 -08:00
Cyrus Leung
c8ed39b9dd [Model] Reorganize pooling layers (#31973)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2026-01-09 11:02:14 +00:00
Alex Brooks
dc77cb7129 [Bugfix] Fix Var Length Batched Padding in Granite Speech (#31906)
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
2026-01-09 10:28:43 +00:00
Lucas Wilkinson
6cdf015c3c [Misc] Fix Current vLLM config is not set. warnings, assert to avoid issues in the future (#31747)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
2026-01-08 15:20:49 -08:00
yxing-bj
fe86be66c5 [Model] Support IQuestCoder model (#31575)
Signed-off-by: yxing <yxing@iquestlab.com>
2026-01-08 14:42:57 +00:00
Ce Zhao
1123a87892 [Model] Enable LoRA support for Pixtral (#31724)
Signed-off-by: <>
Signed-off-by: 赵策 <alcor@zhaocedeMacBook-Air.local>
Signed-off-by: 赵策 <alcor@mac.mynetworksettings.com>
Co-authored-by: 赵策 <alcor@mac.mynetworksettings.com>
2026-01-08 05:00:57 -08:00
tianshu-Michael-yu
03fd76c570 [Model] Add LFM2-VL model support (#31758)
Signed-off-by: Tianshu Yu <tianshuyu.formal@gmail.com>
Signed-off-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2026-01-08 05:00:27 -08:00
Bijaya Dangol
59d260f5e4 [Model] Add Grok-2 (#31847)
Signed-off-by: dangoldbj <dangoldbj23@gmail.com>
2026-01-08 04:59:48 -08:00
Patrick von Platen
18d4e481d0 [Voxtral] Fix speech transcription api (#31388)
Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: bk-201 <joy25810@foxmail.com>
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Signed-off-by: prashanth058 <prashanth.dannamaneni@uipath.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: bk-201 <joy25810@foxmail.com>
Co-authored-by: prashanth058 <prashanth.dannamaneni@uipath.com>
Co-authored-by: Anexdeus <5142168@mail.ru>
Co-authored-by: Julien Denize <40604584+juliendenize@users.noreply.github.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
Co-authored-by: Nicolò Lucchesi <nicolo.lucchesi@gmail.com>
2026-01-08 18:34:19 +08:00
Isotr0py
2972a05473 [MM Encoder]: Make MMEncoderAttention's scale takes effect properly (#31950)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2026-01-08 02:33:48 -08:00
Cyrus Leung
5576227bc1 [Model] Standardize common vision encoders (#31947)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2026-01-08 02:33:16 -08:00
Cyrus Leung
d1b6fe007f [Chore] Further cleanup pooler (#31951)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2026-01-08 02:16:21 -08:00
BingjiaWang
96fcd3c267 [Misc] Support qwen3-next lora (#31719) 2026-01-08 09:27:50 +00:00
Isotr0py
eac3b96ec0 [Models] Allow converting Qwen3-VL into Reranker model (#31890)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2026-01-08 08:10:15 +00:00
Zyyeric
63baa28cf5 [Model] Enable LoRA support for tower and connector in GLM4-V (#31652)
Signed-off-by: Zyyeric <eric1976808123@gmail.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
2026-01-08 15:45:53 +08:00
ShaanveerS
9572f74f15 [Model] Enable LoRA support for tower and connector in DotsOCR (#31825)
Signed-off-by: ShaanveerS <shaanver.singh@gmail.com>
2026-01-08 14:50:16 +08:00
roikoren755
bf184a6621 Enable quantized attention in NemotronH models (#31898)
Signed-off-by: Roi Koren <roik@nvidia.com>
2026-01-07 17:37:19 +00:00
Cyrus Leung
b7036c87a1 [Refactor] Clean up pooler modules (#31897)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2026-01-08 00:07:43 +08:00
Jared Wen
974138751b [Refactor] GLM-ASR Modeling (#31779)
Signed-off-by: JaredforReal <w13431838023@gmail.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2026-01-07 13:08:29 +00:00
Andy Liu
d111bc53ad [Bugfix][MTP] Fix GLM4 MoE fp8 loading with MTP on (#31757)
Signed-off-by: Andy Liu <andyliu@roblox.com>
2026-01-07 09:18:52 +00:00
BlankR
0790f07695 [Misc] Improve error messages for unsupported types and parameters (#30593)
Signed-off-by: BlankR <hjyblanche@gmail.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
2026-01-07 09:00:16 +00:00
maang
1f33e38e81 [Model] Cleanup: Remove redundant manual definition of make_empty_intermediate_tensors in GLM-4-MoE (#31869)
Signed-off-by: maang <maang_h@163.com>
2026-01-07 08:18:28 +00:00
ℍ𝕠𝕝𝕝𝕠𝕨 𝕄𝕒𝕟
482914849c [BugFix] LoRA: Support loading base_layer of experts (#31104)
Signed-off-by: Hollow Man <hollowman@opensuse.org>
2026-01-07 14:49:39 +08:00