WeiQing Chen
|
e283976f3a
|
[Performance][MM] Building the inverse permutation in O(n) time in Qwen2_5_VisionTransformer (#24443)
Signed-off-by: Junhong <liujunhong11@huawei.com>
Co-authored-by: Junhong <liujunhong11@huawei.com>
|
2025-09-09 00:24:11 -07:00 |
|
Benji Beck
|
37a6fa95fd
|
Migrate Qwen2 inputs to TensorSchema (#23475)
Signed-off-by: Benji Beck <benjibeck@meta.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-09-06 20:07:31 -07:00 |
|
Isotr0py
|
53b19ccdd5
|
[Core] Allow disabling TP sharding for parallel Linear layer (#23024)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-09-05 22:53:58 -07:00 |
|
WeiQing Chen
|
a0e0efd6bd
|
[Model] Support DP for ViT on Kimi-VL-A3B-Thinking-2506 (#23817)
Signed-off-by: Junhong <liujunhong11@huawei.com>
Signed-off-by: LJH-LBJ <98734602+LJH-LBJ@users.noreply.github.com>
Co-authored-by: Junhong <liujunhong11@huawei.com>
Co-authored-by: LJH-LBJ <98734602+LJH-LBJ@users.noreply.github.com>
Co-authored-by: Isotr0py <2037008807@qq.com>
|
2025-09-01 16:56:56 +00:00 |
|
Cyrus Leung
|
fe8d7b6f03
|
[Model] Interface to enable batch-level DP support (#23733)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-08-27 06:41:22 -07:00 |
|
Jee Jee Li
|
9b5f64238f
|
[Bugfix] Fix Qwen25VL packed_modules_mapping (#23604)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
|
2025-08-26 01:09:14 -07:00 |
|
zifeitong
|
a71e4765cc
|
[Bugfix] Fix Qwen2.5-VL quantized model weights loading (#23512)
Signed-off-by: Zifei Tong <zifeitong@gmail.com>
|
2025-08-25 10:40:22 +08:00 |
|
Cyrus Leung
|
5efd6905bc
|
[CLI][Doc] Formalize --mm-encoder-tp-mode (#23190)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-08-20 23:42:28 +08:00 |
|
TJian
|
1298c67795
|
[FEAT] [Performance] Enable DP for ViT in Qwen2.5VL (#22742)
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2025-08-19 15:25:57 +00:00 |
|
Yuanyuan Chen
|
6772bb0f7d
|
Remove unnecessary CUDA sync of qwen image and video preprocess (#22792)
Signed-off-by: cyy <cyyever@outlook.com>
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2025-08-13 06:07:28 -07:00 |
|
Roger Wang
|
08b751ba74
|
Implicit language-model-only mode via limit-mm-per-prompt (#22299)
Signed-off-by: Roger Wang <hey@rogerw.me>
Signed-off-by: Andy Xie <andy.xning@gmail.com>
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Signed-off-by: Andrew Sansom <andrew@protopia.ai>
Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>
Signed-off-by: Shu Wang <shuw@nvidia.com>
Signed-off-by: Po-Han Huang <pohanh@nvidia.com>
Signed-off-by: Shu Wang. <shuw@nvidia.com>
Signed-off-by: XIn Li <xinli@nvidia.com>
Signed-off-by: Junhao Li <junhao@ubicloud.com>
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
Signed-off-by: zRzRzRzRzRzRzR <2448370773@qq.com>
Signed-off-by: zitian.zhao <zitian.zhao@tencentmusic.com>
Signed-off-by: zitian zhao <zitian.zhao@tencentmusic.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: iAmir97 <Amir.balwel@embeddedllm.com>
Signed-off-by: iAmir97 <71513472+iAmir97@users.noreply.github.com>
Signed-off-by: Linkun <github@lkchen.net>
Co-authored-by: Ning Xie <andy.xning@gmail.com>
Co-authored-by: TJian <tunjian.tan@embeddedllm.com>
Co-authored-by: Andrew Sansom <andrew@protopia.ai>
Co-authored-by: Zhiyu <zhiyuc@nvidia.com>
Co-authored-by: Shu Wang <shuw@nvidia.com>
Co-authored-by: XIn Li <xinli@nvidia.com>
Co-authored-by: Junhao Li <streaver91@gmail.com>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
Co-authored-by: Yuxuan Zhang <2448370773@qq.com>
Co-authored-by: ZiTian Zhao <zitian.zhao@tencentmusic.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
Co-authored-by: Po-Han Huang (NVIDIA) <53919306+nvpohanh@users.noreply.github.com>
Co-authored-by: iAmir97 <71513472+iAmir97@users.noreply.github.com>
Co-authored-by: iAmir97 <Amir.balwel@embeddedllm.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Hong Hanh <hanh.usth@gmail.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: lkchen <github@lkchen.net>
|
2025-08-08 22:21:40 -07:00 |
|
vllmellm
|
cbc8457b26
|
[Model] Switch to Fused RMS norm in Qwen2.5_VL model. (#22184)
Signed-off-by: kf <kuanfu.liu@embeddedllm.com>
Signed-off-by: tjtanaavllm <tunjian.tan@amd.com>
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
Co-authored-by: kf <kuanfu.liu@embeddedllm.com>
|
2025-08-06 23:05:24 -07:00 |
|
vllmellm
|
d3a6f2120b
|
[FEAT][ROCm] Enable running Flash Attention as ViT attn backend for Qwen-VL models on ROCm platform. (#22069)
Signed-off-by: tjtanaavllm <tunjian.tan@amd.com>
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
Co-authored-by: tjtanaavllm <tunjian.tan@amd.com>
|
2025-08-01 23:53:18 -07:00 |
|
vllmellm
|
ee2eb6ecd8
|
[Model] Qwen2.5 VL SiLU-and-Mul (#22066)
Signed-off-by: kf <kuanfu.liu@embeddedllm.com>
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
Co-authored-by: kf <kuanfu.liu@embeddedllm.com>
|
2025-08-01 19:34:37 -07:00 |
|
Cyrus Leung
|
82de9b9d46
|
[Misc] Automatically resolve HF processor init kwargs (#22005)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-07-31 22:44:10 -07:00 |
|
Avshalom Manevich
|
a0f8a79646
|
[fix] fix qwen image_embeds input (#21049)
Signed-off-by: h-avsha <avshalom.manevich@hcompany.ai>
|
2025-07-16 15:17:20 +00:00 |
|
Cyrus Leung
|
b024a42e93
|
[Core] Move multimodal placeholder from chat utils to model definition (#20355)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-07-03 08:18:30 +00:00 |
|
Kyle Sayers
|
9025a9a705
|
[Quant] [Bugfix] Fix quantization config matching with hf_to_vllm_mapper (#20046)
|
2025-07-01 19:20:34 +09:00 |
|
Lu Fang
|
b1098b4072
|
[Bugfix] Fix the linter (#19826)
Signed-off-by: Lu Fang <lufang@fb.com>
|
2025-06-18 21:44:41 -07:00 |
|
Woosuk Kwon
|
d49adea1f9
|
[Multimodal] Use fast processor for Qwen2/2.5-VL (#19789)
|
2025-06-18 15:49:40 -07:00 |
|
Russell Bryant
|
14fdd21d39
|
[Core] More fixes to MultiModalEmbeddings type handling (#19715)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2025-06-18 22:48:29 +00:00 |
|
Russell Bryant
|
90f9c2eb5c
|
[V1] Change return type on get_multimodal_embeddings() (#19446)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2025-06-16 13:32:15 -04:00 |
|
Isotr0py
|
2db9044ab6
|
[Bugfix] Fix auto dtype casting for BatchFeature (#19316)
Signed-off-by: Isotr0py <2037008807@qq.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-06-14 15:13:08 +00:00 |
|
Simon Mo
|
02f0c7b220
|
[Misc] Add SPDX-FileCopyrightText (#19100)
Signed-off-by: simon-mo <simon.mo@hey.com>
|
2025-06-03 11:20:17 -07:00 |
|
Isotr0py
|
ec2dcd80bc
|
[Misc] Update WeightsMapper for qwen2-vl/qwen2.5-vl (#19054)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-06-03 09:08:20 +00:00 |
|
Isotr0py
|
a35ca765a5
|
[LoRA] Support dynamically initialize packed_modules_mapping for VLM with arbitrary components (#18987)
Signed-off-by: isotr0py <2037008807@qq.com>
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-06-01 11:06:57 +08:00 |
|
Vadim Gimpelson
|
67da5720d4
|
[PERF] Speed up Qwen2.5-VL model by speed up rotary position embedding (#17973)
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@centml.ai>
|
2025-05-15 23:31:02 -07:00 |
|
Harry Mellor
|
26d0419309
|
Update deprecated type hinting in models (#18132)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-05-14 22:06:50 -07:00 |
|
Cyrus Leung
|
015815fe01
|
[Bugfix] use_fast failing to be propagated to Qwen2-VL image processor (#17838)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-05-08 05:39:21 -07:00 |
|
Isotr0py
|
c3e9d5060e
|
[Misc] Use apply_rotary_emb from vllm_flash_attn for Qwen2-VL vision RoPE (#17726)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-05-07 04:51:33 +00:00 |
|
Jee Jee Li
|
4283a28c2f
|
[Bugfix] Fix QWen2 VL multimodal mapping (#17240)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-04-27 05:53:23 +00:00 |
|
Woosuk Kwon
|
b411418ff0
|
[Chore] Remove Sampler from Model Code (#17084)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-04-24 02:49:33 -07:00 |
|
Yang Fan
|
571e8dd65e
|
[Bugfix] Fix distributed bug again in Qwen2.5-VL & Qwen2.5-Omni (#16974)
Signed-off-by: fyabc <suyang.fy@alibaba-inc.com>
|
2025-04-22 12:23:17 +00:00 |
|
Yang Fan
|
26c0406555
|
[Bugfix] Fix distributed bug in Qwen2.5-VL & Qwen2.5-Omni (#16907)
|
2025-04-21 10:25:21 +00:00 |
|
Yang Fan
|
2c1bd848a6
|
[Model][VLM] Add Qwen2.5-Omni model support (thinker only) (#15130)
Signed-off-by: fyabc <suyang.fy@alibaba-inc.com>
Signed-off-by: Roger Wang <ywang@roblox.com>
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
Co-authored-by: Xiong Wang <wangxiongts@163.com>
|
2025-04-18 23:14:36 -07:00 |
|
Nicolò Lucchesi
|
d55244df31
|
[Model] Add SupportsMultiModal.get_language_model interface (#16007)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2025-04-09 04:12:54 -07:00 |
|
Isotr0py
|
47c7126213
|
[Misc] Add attention mask pre-computation optimization back to Qwen2.5-VL (#15273)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-03-21 10:32:33 +00:00 |
|
Isotr0py
|
1e508343e1
|
[Bugfix] Fix incorrect qwen2.5-vl attention mask pre-computation (#15200)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-03-20 19:18:04 -07:00 |
|
Cyrus Leung
|
601bd3268e
|
[Misc] Clean up type annotation for SupportsMultiModal (#14794)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-03-14 00:59:56 -07:00 |
|
yexin(叶鑫)
|
70b808fe1a
|
[Perf]:Optimize qwen2-vl to reduce cudaMemcpyAsync (#14377)
Signed-off-by: cynthieye <987073381@qq.com>
|
2025-03-11 07:39:56 +00:00 |
|
Yang Liu
|
9b61dd41e7
|
[Bugfix] Initialize attention bias on the same device as Query/Key/Value for QwenVL Series (#14031)
|
2025-02-28 07:36:08 -08:00 |
|
Isotr0py
|
7864875879
|
[Bugfix] Fix qwen2.5-vl overflow issue (#13968)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-02-27 17:30:39 +00:00 |
|
Harry Mellor
|
cdc1fa12eb
|
Remove unused kwargs from model definitions (#13555)
|
2025-02-24 17:13:52 -08:00 |
|
Jee Jee Li
|
105b8ce4c0
|
[Misc] Reduce LoRA-related static variable (#13166)
|
2025-02-22 00:21:30 -08:00 |
|
燃
|
041e294716
|
[Misc] add mm_processor_kwargs to extra_body for Qwen2.5-VL (#13533)
|
2025-02-19 23:04:30 -08:00 |
|
Jee Jee Li
|
512368e34a
|
[Misc] Qwen2.5 VL support LoRA (#13261)
|
2025-02-19 18:37:55 -08:00 |
|
Cyrus Leung
|
377d10bd14
|
[VLM][Bugfix] Pass processor kwargs properly on init (#13516)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-02-19 13:13:50 +00:00 |
|
Isotr0py
|
7fdaaf48ef
|
[Bugfix] Fix qwen2.5-vl image processor (#13286)
|
2025-02-15 03:00:11 -08:00 |
|
燃
|
02ed8a1fbe
|
[Misc] Qwen2.5-VL Optimization (#13155)
|
2025-02-13 06:17:57 -08:00 |
|
Isotr0py
|
4c8dd12ef3
|
[Misc] Add qwen2.5-vl BNB support (#12944)
|
2025-02-08 04:24:47 -08:00 |
|