Matthew Bonanni
|
2612ba9285
|
[1/N][Attention] Restructure attention: move files (#31916)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
|
2026-01-09 13:10:24 -08:00 |
|
Isotr0py
|
2972a05473
|
[MM Encoder]: Make MMEncoderAttention's scale takes effect properly (#31950)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2026-01-08 02:33:48 -08:00 |
|
Jee Jee Li
|
ce1eafd1a5
|
[Core] Initialize LoRA support for tower and connector in multi-modal models (#26674)
Signed-off-by: bk-201 <joy25810@foxmail.com>
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Signed-off-by: prashanth058 <prashanth.dannamaneni@uipath.com>
Co-authored-by: bk-201 <joy25810@foxmail.com>
Co-authored-by: prashanth058 <prashanth.dannamaneni@uipath.com>
Co-authored-by: Anexdeus <5142168@mail.ru>
|
2025-12-26 04:48:20 -08:00 |
|
dengyunyang
|
8f8f469b1b
|
[BugFix] skip language model in Encoder (#30242)
Signed-off-by: dengyunyang <584797741@qq.com>
|
2025-12-22 05:25:59 -08:00 |
|
Shanshan Shen
|
3bd9c49158
|
[CustomOp] Extract ApplyRotaryEmb as CustomOp and unify the dispatch logic (#29873)
Signed-off-by: shen-shanshan <467638484@qq.com>
Co-authored-by: gcanlin <canlinguosdu@gmail.com>
Co-authored-by: TJian <tunjian.tan@embeddedllm.com>
|
2025-12-15 19:08:16 -08:00 |
|
Shanshan Shen
|
87b4d1557d
|
[CustomOp][MM] Extract MMEncoderAttention as CustomOp and replace the backend of QwenVisionAttention with it. (#30125)
Signed-off-by: shen-shanshan <467638484@qq.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com>
|
2025-12-15 11:13:32 +08:00 |
|
Ilya Markov
|
3224ea9915
|
[torch.compile] Add encoder tag for compilation (#30489)
Signed-off-by: ilmarkov <markovilya197@gmail.com>
|
2025-12-14 18:15:11 +08:00 |
|
Harry Mellor
|
cf3eacfe58
|
Standardise get_rope to use rope_parameters["partial_rotary_factor"], not rotary_dim (#30389)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-12-11 20:45:23 +00:00 |
|
Cyrus Leung
|
671427efbf
|
[Model] Move multimodal_cpu_fields definition to field config (#30181)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-12-06 13:40:02 +00:00 |
|
Cyrus Leung
|
c46b932df2
|
[Chore] Deprecate SupportsMultiModal.merge_by_field_config (#30170)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-12-06 07:57:28 +00:00 |
|
Cyrus Leung
|
b286a311c2
|
[Chore] Deprecate merge_by_field_config arg (#30035)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-12-04 17:21:24 +00:00 |
|
Mingyuan Ma
|
460d8bbf2d
|
Remove upstream fa checks (#29471)
Signed-off-by: mingyuanm <mingyuanm@nvidia.com>
Signed-off-by: Roger Wang <hey@rogerw.io>
Co-authored-by: Roger Wang <hey@rogerw.io>
|
2025-11-28 05:52:42 -08:00 |
|
Roger Wang
|
0ff70821c9
|
[Core] Deprecate xformers (#29262)
Signed-off-by: Roger Wang <hey@rogerw.io>
|
2025-11-24 04:18:55 +00:00 |
|
ZiTian Zhao
|
d84d8f4429
|
Fix EVS crash when using video_embeds inputs in Qwen2.5-VL (#29232)
Signed-off-by: zitian.zhao <zitian.zhao@tencentmusic.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2025-11-22 06:48:59 -08:00 |
|
Harry Mellor
|
a8b70304d6
|
Update rope_scaling to rope_parameters in preparation for Transformers v5 (#28542)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-11-19 09:06:36 -08:00 |
|
Lukas Geiger
|
3d4e7d34be
|
[Model][QwenVL] Simplify cos/sin rotary embedding indexing (#28962)
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>
|
2025-11-19 05:43:01 +00:00 |
|
Canlin Guo
|
b9489f51e1
|
[Model][Perf] Use cos and sin cache in QwenVL (#28798)
Signed-off-by: gcanlin <canlinguosdu@gmail.com>
|
2025-11-18 11:51:54 +00:00 |
|
Lukas Geiger
|
5a87076d6e
|
[Model][QwenVL] Optimize Qwen2_5_VisionAttention q,k preparation (#28769)
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-11-16 17:37:15 +00:00 |
|
Shanshan Shen
|
41b92f7d38
|
[Model][MM] Extract conv layer as CustomOp (#28455)
Signed-off-by: shen-shanshan <467638484@qq.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-11-14 19:16:13 +08:00 |
|
Harry Mellor
|
97d1c99302
|
Rename clashing method names for vLLM model protocol (#27583)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-11-12 19:14:33 -08:00 |
|
Canlin Guo
|
bc5bd45c7d
|
[Refactor] Remove redundant TP gather/split in split_qkv in QwenVL (#28271)
Signed-off-by: gcanlin <canlinguosdu@gmail.com>
|
2025-11-12 15:56:47 +00:00 |
|
Cyrus Leung
|
afffd3cc8a
|
[Model] Pass mm_features directly into get_mrope_input_positions (#28399)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-11-11 21:14:48 +08:00 |
|
Matthew Bonanni
|
b30dfa03c5
|
[Attention] Refactor CUDA attention backend selection logic (#24794)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
|
2025-11-11 07:40:44 -05:00 |
|
Cyrus Leung
|
d0e186c16f
|
[V0 Deprecation] Remove unused context_len and seq_len from M-RoPE (#28395)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-11-11 00:30:06 +08:00 |
|
Lukas Geiger
|
e0919f331d
|
[Core][MM] Add mechanism to configure multimodal fields which should stay on CPU (#28168)
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>
|
2025-11-07 12:14:29 +00:00 |
|
Harry Mellor
|
c0a4b95d64
|
Fix issues from #28242 (#28257)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-11-07 04:23:17 +00:00 |
|
Lucas Kabela
|
4bf56c79cc
|
[Multimodal][torch.compile] Add compilation config field for turning off ViT/MM compile (#28242)
Signed-off-by: Lucas Kabela <lucaskabela@meta.com>
|
2025-11-07 00:16:03 +00:00 |
|
Lucas Kabela
|
55011aef24
|
[Bugfix][Qwen][Multimodal] Move Qwen2_5_vl sdpa to custom op and reenable compile (#27764)
Signed-off-by: Lucas Kabela <lucaskabela@meta.com>
|
2025-11-03 11:12:15 -08:00 |
|
Yan Ma
|
7e2729b57e
|
[Multimodal][XPU]Enable vision attn backend for xpu platform (#27525)
Signed-off-by: Yan Ma <yan.ma@intel.com>
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
Co-authored-by: Yejing Lai <yejing.lai@intel.com>
Co-authored-by: Guancheng Fu <110874468+gc-fu@users.noreply.github.com>
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
|
2025-11-01 04:45:02 +00:00 |
|
Zhewen Li
|
e806178d2a
|
[BugFix][VL] Fix FA selection on Qwen2.5-VL (#27790)
Signed-off-by: zhewenli <zhewenli@meta.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
|
2025-10-30 07:54:44 +00:00 |
|
Chenheli Hua
|
48eb8eba58
|
[Temp fix] Disable torch.compile for Qwen2.5 VL's VisionBlock temporarily. (#27760)
Signed-off-by: Chenheli Hua <huachenheli@outlook.com>
Signed-off-by: Roger Wang <hey@rogerw.io>
Co-authored-by: Roger Wang <hey@rogerw.io>
|
2025-10-29 23:17:48 +00:00 |
|
JartX
|
7568a282b9
|
[FIXBUG] Qwen3VL hallucinations without Contiguous on Torch.SDPA (#27744)
Signed-off-by: JartX <sagformas@epdcenter.es>
Co-authored-by: Lukas Geiger <lukas.geiger94@gmail.com>
|
2025-10-29 16:55:35 +00:00 |
|
Lukas Geiger
|
0d8161b075
|
[Model] Fix Qwen3VL and Qwen3Omni after torch.compile changes (#27705)
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>
Signed-off-by: Roger Wang <hey@rogerw.io>
Co-authored-by: Roger Wang <hey@rogerw.io>
|
2025-10-29 05:28:20 +00:00 |
|
Lucas Kabela
|
94666612a9
|
[Misc][qwen2_5_vl][torch.compile] Enable supports_torch_compile on generic nn.Module and demonstrate speedup on Qwen Vision model (#23207)
Signed-off-by: Lucas Kabela <lucaskabela@meta.com>
Signed-off-by: Lucas Kabela <lucasakabela@gmail.com>
|
2025-10-28 22:36:43 +00:00 |
|
Cyrus Leung
|
cbd5e07a51
|
[Model] Use merge_by_field_config for MM models (Qwen series) (#27546)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-10-27 05:38:05 +00:00 |
|
JartX
|
65d2cf9511
|
[BUGFIX][ROCM] ViT FlashAttention on ROCm (no GFX9) and contiguous on qwen3vl ROCm TORCH_SDPA (#27190)
Signed-off-by: JartX <sagformas@epdcenter.es>
Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com>
|
2025-10-26 15:08:52 +08:00 |
|
Isotr0py
|
42efe609ba
|
[MM][Bugfix] Replace PatchEmbed's conv3d to linear layer (#27418)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Roger Wang <hey@rogerw.io>
|
2025-10-24 07:32:47 +00:00 |
|
Jonathan Chen
|
ca76486a16
|
[Chore] Separate out vllm.utils.platform_utils.py (#27374)
Signed-off-by: Jonathan <chenleejonathan@gmail.com>
|
2025-10-23 19:08:06 +00:00 |
|
Cyrus Leung
|
14e2f1231e
|
[Bugfix] Make get_mrope_input_positions instance methods (#27342)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-10-22 08:38:34 -07:00 |
|
Roger Wang
|
c3a2c6ac5f
|
[MM][Core] Decouple ViT backend from LM backend (#27061)
Signed-off-by: Roger Wang <hey@rogerw.io>
|
2025-10-21 00:30:10 -07:00 |
|
Lukas Geiger
|
5c2acb270a
|
[Models][QwenVL] Remove unnecessary .contiguous() calls (#27106)
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>
|
2025-10-18 07:05:05 -07:00 |
|
ZiTian Zhao
|
7bb736d00e
|
Fix Qwen2.5 VL image grid docstring (#27033)
Signed-off-by: zitian zhao <zitian.zhao@tencentmusic.com>
|
2025-10-16 09:57:36 -07:00 |
|
Cyrus Leung
|
d2f816d6ff
|
[Bugfix] Standardize merging multimodal embeddings (#26771)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-10-14 09:36:21 +00:00 |
|
Harry Mellor
|
8fcaaf6a16
|
Update Optional[x] -> x | None and Union[x, y] to x | y (#26633)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-10-12 09:51:31 -07:00 |
|
dsinghvi
|
727144bed1
|
[Refactor]: Use M-RoPE interface directly while defining model class instead of maintaining model specific M-RoPE implementation in mrope.py (#24172)
Signed-off-by: Divyansh Singhvi <divyanshsinghvi@gmail.com>
Signed-off-by: dsinghvi <divyanshsinghvi@gmail.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: wwl2755 <wangwenlong2755@gmail.com>
|
2025-10-11 07:21:04 +00:00 |
|
tomeras91
|
b8f603cebe
|
[Model] EVS support for nano_nemotron_vl (#26269)
Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>
Signed-off-by: tomeras91 <57313761+tomeras91@users.noreply.github.com>
Signed-off-by: Eugene Khvedchenia <ekhvedchenia@nvidia.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Eugene Khvedchenia <ekhvedchenia@nvidia.com>
|
2025-10-07 00:23:37 +08:00 |
|
Harry Mellor
|
557b2e961d
|
Remove all cases of fmt: on/off (#26253)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-10-05 09:18:14 -07:00 |
|
Harry Mellor
|
4e256cadc2
|
Remove all references to yapf as it's no longer used (#26251)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-10-05 09:18:11 -07:00 |
|
Harry Mellor
|
d6953beb91
|
Convert formatting to use ruff instead of yapf + isort (#26247)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-10-05 07:06:22 -07:00 |
|
TJian
|
9c5ee91b2a
|
[ROCm] [VL] [Bugfix] Fix vit flash attn dispatcher logic for ROCm (#26104)
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
|
2025-10-02 22:34:53 -07:00 |
|