Commit Graph

69 Commits

Author SHA1 Message Date
Tyler Michael Smith
a5354b3ed2 [Bugfix][WideEP] Apply TP Attn + EP MoE fix to other models (#24982)
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
2025-09-27 14:22:28 +00:00
Harry Mellor
ec152c8748 Fix GPTQ model loading in Transformers backend (#25770)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-09-27 12:18:20 +00:00
Cyrus Leung
27d7638b94 [Bugfix] Merge MM embeddings by index instead of token IDs (#16229)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: NickLucche <nlucches@redhat.com>
Signed-off-by: Roger Wang <hey@rogerw.io>
Co-authored-by: NickLucche <nlucches@redhat.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
2025-09-27 08:15:12 +00:00
XuruiYang
845adb3ec6 [Model] Add LongCat-Flash (#23991)
Signed-off-by: yangxurui <yangxurui@meituan.com>
Co-authored-by: yangxurui <yangxurui@meituan.com>
2025-09-24 21:53:40 -07:00
Cyrus Leung
5089fd749c [V0 Deprecation] Remove V0 logic from get_input_embeddings interface (#25242)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-09-19 11:10:52 +00:00
Jee Jee Li
9ad0688e43 [Bugfix] Fix hidden_size for multimodal classification model (#24501)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-09-09 10:37:25 -07:00
Li, Jiang
67cee40da0 [CI/Build][Bugfix] Fix Qwen VL tests on CPU (#23818)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
2025-08-28 11:57:05 +00:00
Chenheli Hua
c905684cfe [Core] Asynchronous h2d in merge_multimodal_embeddings via pinned memory. (#23686)
Signed-off-by: Chenheli Hua <huachenheli@outlook.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
2025-08-26 20:05:34 -07:00
ZiTian Zhao
534c45b962 Improve fast_topk function with type hints and documentation (#22530)
Signed-off-by: zitian.zhao <zitian.zhao@tencentmusic.com>
2025-08-09 20:25:42 -07:00
Chauncey
17eaaef595 [Bugfix] Fix RuntimeError: Index put requires the source and destination dtypes match (#22065)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
2025-08-07 19:20:21 -07:00
Harry Mellor
796bae07c5 Update transformers to v4.55 (#21931)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: Isotr0py <2037008807@qq.com>
Signed-off-by: isotr0py <2037008807@qq.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-08-05 22:56:14 -07:00
Chenxi Yang
e5949e5ae0 Remove index_put from MM embeddings merging (#22105)
Co-authored-by: Chenxi Yang <cxyang@meta.com>
2025-08-03 22:15:14 -07:00
Kyle Sayers
9025a9a705 [Quant] [Bugfix] Fix quantization config matching with hf_to_vllm_mapper (#20046) 2025-07-01 19:20:34 +09:00
Simon Mo
02f0c7b220 [Misc] Add SPDX-FileCopyrightText (#19100)
Signed-off-by: simon-mo <simon.mo@hey.com>
2025-06-03 11:20:17 -07:00
Hyogeun Oh (오효근)
a68e293cb9 [Doc] Convert Sphinx directives ( {class}, {meth}, {attr}, ...) to MkDocs format for better documentation linking (#18663)
Signed-off-by: Zerohertz <ohg3417@gmail.com>
2025-05-27 01:44:20 -07:00
Isotr0py
f07a673eb2 [Misc] Allow AutoWeightsLoader to skip loading weights with specific substr in name (#18358)
Signed-off-by: Isotr0py <2037008807@qq.com>
2025-05-19 20:20:12 -07:00
Harry Mellor
26d0419309 Update deprecated type hinting in models (#18132)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-05-14 22:06:50 -07:00
Jee Jee Li
822de7fb94 [Misc] Split model loader (#17712)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-05-07 12:42:26 +08:00
Harry Mellor
d6484ef3c3 Add full API docs and improve the UX of navigating them (#17485)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-05-03 19:42:43 -07:00
Michael Goin
bd6028d6b0 Optimized topk for topk=1 (Llama-4) (#16512)
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-04-12 14:21:08 +08:00
Alex Brooks
c2e7507ad4 [Bugfix] Fix Crashing When Loading Modules With Batchnorm Stats (#15813)
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
2025-03-31 13:23:53 +00:00
Naveassaf
3aa2b6a637 [Model] Update support for NemotronNAS models (#15008)
Signed-off-by: Nave Assaf <nassaf@nvidia.com>
2025-03-31 20:35:14 +08:00
youkaichao
555aa21905 [V1] Fully Transparent Implementation of CPU Offloading (#15354)
Signed-off-by: youkaichao <youkaichao@gmail.com>
2025-03-31 20:22:34 +08:00
Harry Mellor
97cfa65df7 Add pipeline parallel support to TransformersModel (#12832)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Isotr0py <2037008807@qq.com>
2025-03-25 10:41:45 +08:00
Nicolò Lucchesi
69ff99fdcd [Core] Optimizing cross-attention QKVParallelLinear computation (#12325)
Signed-off-by: NickLucche <nlucches@redhat.com>
Signed-off-by: NickLucche <nick@nlucches-4xa100.c.openshift-330514.internal>
Co-authored-by: NickLucche <nick@nlucches-4xa100.c.openshift-330514.internal>
2025-03-06 09:37:26 +00:00
Isotr0py
7864875879 [Bugfix] Fix qwen2.5-vl overflow issue (#13968)
Signed-off-by: Isotr0py <2037008807@qq.com>
2025-02-27 17:30:39 +00:00
Russell Bryant
e489ad7a21 [Misc] Add SPDX-License-Identifier headers to python source files (#12628)
- **Add SPDX license headers to python source files**
- **Check for SPDX headers using pre-commit**

commit 9d7ef44c3cfb72ca4c32e1c677d99259d10d4745
Author: Russell Bryant <rbryant@redhat.com>
Date:   Fri Jan 31 14:18:24 2025 -0500

    Add SPDX license headers to python source files
    
This commit adds SPDX license headers to python source files as
recommended to
the project by the Linux Foundation. These headers provide a concise way
that is
both human and machine readable for communicating license information
for each
source file. It helps avoid any ambiguity about the license of the code
and can
    also be easily used by tools to help manage license compliance.
    
The Linux Foundation runs license scans against the codebase to help
ensure
    we are in compliance with the licenses of the code we use, including
dependencies. Having these headers in place helps that tool do its job.
    
    More information can be found on the SPDX site:
    
    - https://spdx.dev/learn/handling-license-info/
    
    Signed-off-by: Russell Bryant <rbryant@redhat.com>

commit 5a1cf1cb3b80759131c73f6a9dddebccac039dea
Author: Russell Bryant <rbryant@redhat.com>
Date:   Fri Jan 31 14:36:32 2025 -0500

    Check for SPDX headers using pre-commit
    
    Signed-off-by: Russell Bryant <rbryant@redhat.com>

---------

Signed-off-by: Russell Bryant <rbryant@redhat.com>
2025-02-02 11:58:18 -08:00
Harry Mellor
823ab79633 Update pre-commit hooks (#12475)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-01-27 17:23:08 -07:00
Cyrus Leung
ca47e176af [Misc] Move some model utils into vision file (#11848)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-01-08 17:04:46 +00:00
Cyrus Leung
91445c7bc8 [Bugfix] Fix image input for Pixtral-HF (#11741)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-01-08 10:17:16 +08:00
Cyrus Leung
8c38ee7007 [VLM] Merged multi-modal processor for LLaVA-NeXT (#11682)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-01-02 16:39:27 +00:00
Roger Wang
a11f326528 [V1] Initial support of multimodal models for V1 re-arch (#10699)
Signed-off-by: Roger Wang <ywang@roblox.com>
2024-12-08 12:50:51 +00:00
Cyrus Leung
bf0e382e16 [Model] Composite weight loading for multimodal Qwen2 (#10944)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-12-07 07:22:52 -07:00
Cyrus Leung
133707123e [Model] Replace embedding models with pooling adapter (#10769)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-12-01 08:02:54 +08:00
Cyrus Leung
fa6ecb9aa7 [Model] Clean up MiniCPMV (#10751)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-11-29 04:47:06 +00:00
Roger Wang
2f0a0a17a4 [V1] Refactor model executable interface for multimodal models (#10570)
Signed-off-by: Roger Wang <ywang@roblox.com>
2024-11-26 20:46:11 +00:00
youkaichao
c055747867 [model][utils] add extract_layer_index utility function (#10599)
Signed-off-by: youkaichao <youkaichao@gmail.com>
2024-11-23 22:22:54 -08:00
Mengqing Cao
8c1fb50705 [Platform][Refactor] Extract func get_default_attn_backend to Platform (#10358)
Signed-off-by: Mengqing Cao <cmq0113@163.com>
2024-11-19 11:22:26 +08:00
Isotr0py
e7ebb662d7 [Model] Remove transformers attention porting in VITs (#10414)
Signed-off-by: Isotr0py <2037008807@qq.com>
2024-11-18 21:45:21 +08:00
Isotr0py
c4e464333e [Misc] Add uninitialized params tracking for AutoWeightsLoader (#10327)
Signed-off-by: Isotr0py <2037008807@qq.com>
2024-11-18 09:07:46 +08:00
youkaichao
504ac53d18 [misc] error early for old-style class (#10304)
Signed-off-by: youkaichao <youkaichao@gmail.com>
2024-11-13 18:55:39 -08:00
Cyrus Leung
0b8bb86bf1 [1/N] Initial prototype for multi-modal processor (#10044)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-11-13 12:39:03 +00:00
Cyrus Leung
51c2e1fcef [CI/Build] Split up models tests (#10069)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-11-09 11:39:14 -08:00
youkaichao
1a95f10ee7 [5/N] pass the whole config to model (#9983)
Signed-off-by: youkaichao <youkaichao@gmail.com>
2024-11-09 14:17:28 +08:00
youkaichao
3bb4befea7 [bugfix] fix tsts (#9959)
Signed-off-by: youkaichao <youkaichao@gmail.com>
2024-11-02 15:54:05 -07:00
Peter Salas
6c0b7f548d [Core][VLM] Add precise multi-modal placeholder tracking (#8346)
Signed-off-by: Peter Salas <peter@fixie.ai>
2024-11-01 16:21:10 -07:00
Michael Goin
bc73e9821c [Bugfix] Fix prefix strings for quantized VLMs (#9772) 2024-10-29 16:02:59 -07:00
Cyrus Leung
836e8ef6ee [Bugfix] Fix PP for ChatGLM and Molmo (#9422) 2024-10-24 06:12:05 +00:00
Cyrus Leung
831540cf04 [Model] Support E5-V (#9576) 2024-10-23 11:35:29 +08:00
wangshuai09
3ddbe25502 [Hardware][CPU] using current_platform.is_cpu (#9536) 2024-10-22 00:50:43 -07:00