Commit Graph

93 Commits

Author SHA1 Message Date
Max Hu
9c3fe9936b Flashinfer cuDNN backend for Qwen3 VL ViT attention (#34580)
Signed-off-by: Max Hu <maxhu@nvidia.com>
Signed-off-by: Max Hu <hyoung2991@gmail.com>
Co-authored-by: Max Hu <maxhu@nvidia.com>
Co-authored-by: Shang Wang <shangw@nvidia.com>
2026-02-27 20:20:23 +08:00
Cyrus Leung
845ee348ef [Misc] Standardize handling of mm_processor_kwargs.size (#35284)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2026-02-26 13:05:46 +00:00
Cyrus Leung
987506bca6 [Refactor] Simplify dummy data generation (#35025)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2026-02-22 20:55:27 -08:00
Isotr0py
71cd89264f [MM Encoder] Add Triton ViT attention backend (#32183)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2026-02-15 06:32:47 -08:00
Cyrus Leung
372b2e762a [Bugfix] Standardize getting number of image patches/tokens (#34358)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2026-02-12 20:47:01 -08:00
Isotr0py
0ab06100f4 [Multimodal] Expose mm_processor_kwargs for DummyInputsBuilder (#34330)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2026-02-11 09:37:40 -08:00
Harry Mellor
1e9204bff3 Make Qwen3VL compatible with Transformers v5 (#34262)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: Roger Wang <hey@rogerw.io>
Co-authored-by: Roger Wang <hey@rogerw.io>
2026-02-11 04:13:23 -08:00
Isotr0py
192ad4648b [Bugfix] Fix interns1-pro initialization and PP (#33793)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2026-02-04 17:54:45 +00:00
zxy
a3acfa1071 [Models] Intern-S1-Pro (#33636)
Signed-off-by: zxy <zhou0493@e.ntu.edu.sg>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2026-02-03 05:49:45 -08:00
Cyrus Leung
51550179fc [Refactor] Define MM data parser in processing info instead of processor itself (#33260)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2026-01-29 13:55:17 +08:00
Cyrus Leung
dcd80206b7 [Chore] Update type annotation of input_ids in model forward (#33063)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2026-01-26 06:02:10 -08:00
Cyrus Leung
11b556878b [Refactor] Use data parser for matching data items to multi-modal UUIDs (#32955)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2026-01-26 15:00:28 +08:00
JJJYmmm
7e67df5570 [Bugfix] fix encoder cache hang in Qwen3VL (#32684)
Signed-off-by: JJJYmmm <92386084+JJJYmmm@users.noreply.github.com>
Signed-off-by: Roger Wang <hey@rogerw.io>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Roger Wang <hey@rogerw.io>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2026-01-25 05:17:31 +00:00
Isotr0py
9ad7f89f55 [Models]: Make Multimodal config implicit in ViT implementation (#31972)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2026-01-24 20:34:26 +08:00
Cyrus Leung
193069d129 [5/N] Initialize MM components in context managers (Q-Z) (#32695)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2026-01-20 19:10:23 +00:00
JJJYmmm
04a9e064db [Bugfix] fix the ima issue of qwen-vit (#32687)
Signed-off-by: JJJYmmm <92386084+JJJYmmm@users.noreply.github.com>
2026-01-20 17:21:25 +00:00
Cyrus Leung
4753f3bf69 [Model] Use context managers for encoder- and LM-only mode (#32605)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2026-01-20 11:43:38 +08:00
Cyrus Leung
9ea07b41da [1/N] Reorganize multimodal processing code (#32327)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2026-01-14 15:25:31 +00:00
Matthew Bonanni
2612ba9285 [1/N][Attention] Restructure attention: move files (#31916)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
2026-01-09 13:10:24 -08:00
Jee Jee Li
ce1eafd1a5 [Core] Initialize LoRA support for tower and connector in multi-modal models (#26674)
Signed-off-by: bk-201 <joy25810@foxmail.com>
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Signed-off-by: prashanth058 <prashanth.dannamaneni@uipath.com>
Co-authored-by: bk-201 <joy25810@foxmail.com>
Co-authored-by: prashanth058 <prashanth.dannamaneni@uipath.com>
Co-authored-by: Anexdeus <5142168@mail.ru>
2025-12-26 04:48:20 -08:00
dengyunyang
8f8f469b1b [BugFix] skip language model in Encoder (#30242)
Signed-off-by: dengyunyang <584797741@qq.com>
2025-12-22 05:25:59 -08:00
Roger Wang
f5f51e5931 [Core][MM] Optimize encoder cache manager by operating with embeddings only (#30475)
Signed-off-by: Roger Wang <hey@rogerw.io>
Co-authored-by: Sun Kim <sunytokki@gmail.com>
2025-12-16 14:18:17 -08:00
Shanshan Shen
87b4d1557d [CustomOp][MM] Extract MMEncoderAttention as CustomOp and replace the backend of QwenVisionAttention with it. (#30125)
Signed-off-by: shen-shanshan <467638484@qq.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com>
2025-12-15 11:13:32 +08:00
ZiTian Zhao
ae88aada38 [Feature]Add EVS (Efficient Video Sampling) Support for Qwen3-VL (#29752)
Signed-off-by: zitian.zhao <zitian.zhao@tencentmusic.com>
Co-authored-by: deitxfge <huhaibo1990@126.com>
2025-12-14 05:24:56 -08:00
Harry Mellor
cf3eacfe58 Standardise get_rope to use rope_parameters["partial_rotary_factor"], not rotary_dim (#30389)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-12-11 20:45:23 +00:00
Cyrus Leung
3a3b06ee70 [Misc] Improve error message for is_multimodal (#30483)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-12-11 06:39:51 -08:00
Cyrus Leung
979f50efd0 [Deprecation] Remove fallbacks for embed_input_ids and embed_multimodal (#30458)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-12-11 06:58:23 +00:00
Cyrus Leung
671427efbf [Model] Move multimodal_cpu_fields definition to field config (#30181)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-12-06 13:40:02 +00:00
Cyrus Leung
c46b932df2 [Chore] Deprecate SupportsMultiModal.merge_by_field_config (#30170)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-12-06 07:57:28 +00:00
Tao Yun
6dcb07f676 support qwen3-vl handle requests with embeddings (#30037)
Signed-off-by: taoyun <1069423820@qq.com>
Signed-off-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2025-12-04 17:34:06 +00:00
Cyrus Leung
fe3398fab2 [Chore] Enable passing tokenizer=None into MM processor (#29724)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-11-29 06:25:10 -08:00
Mingyuan Ma
460d8bbf2d Remove upstream fa checks (#29471)
Signed-off-by: mingyuanm <mingyuanm@nvidia.com>
Signed-off-by: Roger Wang <hey@rogerw.io>
Co-authored-by: Roger Wang <hey@rogerw.io>
2025-11-28 05:52:42 -08:00
Cyrus Leung
b34e8775a3 Revert "[CPU]Update CPU PyTorch to 2.9.0 (#29589)" (#29647)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-11-27 22:43:18 -08:00
EanWang211123
37b15e97e8 [Multimodal][Speculative Decoding]Eagle3 mm support, enablement on qwen3vl (#29594)
Signed-off-by: Tsai, Louie <louie.tsai@intel.com>
Signed-off-by: EanWang211123 <wangyiheng@sangfor.com.cn>
Co-authored-by: Louie Tsai <louie.tsai@intel.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2025-11-27 22:05:45 -08:00
Roger Wang
0ff70821c9 [Core] Deprecate xformers (#29262)
Signed-off-by: Roger Wang <hey@rogerw.io>
2025-11-24 04:18:55 +00:00
Lukas Geiger
a9705a290a [Model][QwenVL] Replace torch.repeat_interleave with faster np.repeat (#28964)
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>
2025-11-19 22:04:23 -08:00
Harry Mellor
a8b70304d6 Update rope_scaling to rope_parameters in preparation for Transformers v5 (#28542)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-11-19 09:06:36 -08:00
Lukas Geiger
3d4e7d34be [Model][QwenVL] Simplify cos/sin rotary embedding indexing (#28962)
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>
2025-11-19 05:43:01 +00:00
Canlin Guo
b9489f51e1 [Model][Perf] Use cos and sin cache in QwenVL (#28798)
Signed-off-by: gcanlin <canlinguosdu@gmail.com>
2025-11-18 11:51:54 +00:00
Lukas Geiger
07cadab27a [Model][Qwen3VL] Cache positional embedding indices (#28475)
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
2025-11-15 19:03:09 +00:00
Lukas Geiger
f05d474c8a [Model][Qwen3VL] Use mm_position to compute mrope positions (#28730)
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2025-11-14 19:45:11 -08:00
GuanH
cec275efce [Bugfix] resolve Qwen3-VL GPTQModel quantized model loading failure (#28663)
Signed-off-by: GuanH <guansdrailib@gmail.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-11-14 18:44:27 +00:00
Shanshan Shen
41b92f7d38 [Model][MM] Extract conv layer as CustomOp (#28455)
Signed-off-by: shen-shanshan <467638484@qq.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-11-14 19:16:13 +08:00
Harry Mellor
97d1c99302 Rename clashing method names for vLLM model protocol (#27583)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-11-12 19:14:33 -08:00
Lukas Geiger
cbb799e314 [Model][Qwen3VL] Simplify get_mrope_input_positions using numpy (#28302)
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>
2025-11-12 02:55:10 +00:00
Jee Jee Li
9d1c474704 [LoRA][1/N]Remove LoRA extra vocab (#28382)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-11-11 11:06:21 -08:00
Cyrus Leung
afffd3cc8a [Model] Pass mm_features directly into get_mrope_input_positions (#28399)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-11-11 21:14:48 +08:00
Matthew Bonanni
b30dfa03c5 [Attention] Refactor CUDA attention backend selection logic (#24794)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
2025-11-11 07:40:44 -05:00
Lukas Geiger
9973e6e04a [Model][Qwen3VL] Slighly speedup fast_pos_embed_interpolate (#28434)
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>
2025-11-11 10:35:10 +00:00
Cyrus Leung
d0e186c16f [V0 Deprecation] Remove unused context_len and seq_len from M-RoPE (#28395)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-11-11 00:30:06 +08:00