Cyrus Leung
|
9fb52e523a
|
[V1] Support any head size for FlexAttention backend (#20467)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-07-06 09:54:36 -07:00 |
|
Thomas Parnell
|
2f35a022e6
|
Enable V1 for Hybrid SSM/Attention Models (#20016)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
Co-authored-by: Stanislaw Wozniak <stw@zurich.ibm.com>
Co-authored-by: Tyler Michael Smith <tysmith@redhat.com>
Co-authored-by: Chen Zhang <zhangch99@outlook.com>
|
2025-07-04 17:46:53 +00:00 |
|
wang.yuqi
|
2e26f9156a
|
[Model][3/N] Automatic conversion of CrossEncoding model (#20168)
Signed-off-by: wang.yuqi <noooop@126.com>
|
2025-07-04 05:47:39 -07:00 |
|
Jee Jee Li
|
1caca5a589
|
[Misc] Add SPDX-FileCopyrightText (#20428)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-07-04 07:40:42 +00:00 |
|
wang.yuqi
|
6f1229f91d
|
[Model][2/N] Automatic conversion of CrossEncoding model (#19978)
Signed-off-by: wang.yuqi <noooop@126.com>
|
2025-07-03 13:59:23 +00:00 |
|
CSWYF3634076
|
e303dcf523
|
[Model] Add Ernie4.5 and Ernie4.5MoE Model Support (#20220)
Signed-off-by: wangyafeng <wangyafeng@baidu.com>
|
2025-07-02 03:37:01 -07:00 |
|
Kwai-Keye
|
8452946c06
|
[Model][VLM] Support Keye-VL-8B-Preview (#20126)
Signed-off-by: Kwai-Keye <Keye@kuaishou.com>
|
2025-07-01 23:35:04 -07:00 |
|
aiyiwang2025
|
ecad851cbd
|
[Model]Add Tencent HunYuanMoEV1 Model Support (#20114)
Signed-off-by: aiyiwang <aiyiwang@tencent.com>
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: quinnrong <quinnrong@tencent.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-07-01 07:28:13 -07:00 |
|
Yuxuan Zhang
|
ed70f3c64f
|
Add GLM4.1V model (Draft) (#19331)
Signed-off-by: zRzRzRzRzRzRzR <2448370773@qq.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-07-01 12:48:26 +00:00 |
|
Li, Jiang
|
6cc1e7d96d
|
[CPU] Update custom ops for the CPU backend (#20255)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
|
2025-07-01 07:25:03 +00:00 |
|
redmoe-moutain
|
65b1cbb138
|
[Model] support dots1 (#18254)
Signed-off-by: redmoe-moutain <agiredmoe@gmail.com>
|
2025-06-29 19:34:36 -07:00 |
|
Stan Wozniak
|
daec9dea6e
|
[Bugfix] Correct behavior of GraniteMoeHybrid for TensorParallel execution (#20137)
Signed-off-by: Stanislaw Wozniak <stw@zurich.ibm.com>
|
2025-06-28 08:16:41 -07:00 |
|
Thomas Parnell
|
8615d9776f
|
[CI/Build] Add new CI job to validate Hybrid Models for every PR (#20147)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
|
2025-06-27 23:00:25 -07:00 |
|
Michael Goin
|
c329ceca6d
|
[CI Fix] Pin tests/models/registry.py MiniMaxText01ForCausalLM to revision due to model changes (#20199)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-06-28 13:43:06 +08:00 |
|
Robert Shaw
|
d1c956dc0f
|
Gemma3n (Text-only) (#20134)
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
Signed-off-by: Roger Wang <hey@rogerw.me>
Co-authored-by: Roger Wang <hey@rogerw.me>
|
2025-06-27 07:16:26 +00:00 |
|
wang.yuqi
|
cd4cfee689
|
[Model][1/N] Automatic conversion of CrossEncoding model (#20012)
Signed-off-by: wang.yuqi <noooop@126.com>
|
2025-06-26 21:10:04 -07:00 |
|
Thomas Parnell
|
e110930680
|
[Fix] Fix gemma CI test failing on main (#20124)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
|
2025-06-26 21:06:59 -07:00 |
|
Michael Goin
|
71799fd005
|
[CI Failure] Fix OOM with test_oot_registration_embedding (#20144)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-06-27 11:21:04 +08:00 |
|
Bowen Wang
|
e9fd658a73
|
[Feature] Expert Parallelism Load Balancer (EPLB) (#18343)
Signed-off-by: Bowen Wang <abmfy@icloud.com>
|
2025-06-26 15:30:21 -07:00 |
|
Li, Jiang
|
0567c8249f
|
[CPU] Fix torch version in x86 CPU backend (#19258)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
|
2025-06-26 03:34:47 -07:00 |
|
Michael Goin
|
754b00edb3
|
[Bugfix] Fix Mistral tool-parser regex for nested JSON (#20093)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-06-26 01:01:17 +00:00 |
|
Li, Jiang
|
53da4cd397
|
[Bugfix][CPU] Fix InputBatch for pooling models in the CPU v1 (#20014)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
|
2025-06-24 13:20:04 +00:00 |
|
cascade
|
e6327c9b3e
|
[Feature] Support sequence parallelism for static fp8 quantization (#19181)
Signed-off-by: cascade812 <cascade812@outlook.com>
|
2025-06-23 16:09:02 -04:00 |
|
Isotr0py
|
61f4fc5dc6
|
[Bugfix][v1] Fix step pooler implementation and step pooling usage in v1 (#19956)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-06-23 18:38:06 +00:00 |
|
汪志鹏
|
c3bf9bad11
|
[New model support]Support Tarsier2 (#19887)
Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
|
2025-06-21 04:01:51 +00:00 |
|
Li, Jiang
|
79f2f1c2a1
|
[CPU][CI] Fallback sliding window to v0 and fix CPU pooling model tests (#19901)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
|
2025-06-20 15:30:36 +00:00 |
|
Adrian
|
f1e840e842
|
[Model] GPT2ForSequenceClassification model (#19663)
Signed-off-by: nie3e <adrcwiek@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-06-20 12:07:41 +00:00 |
|
Yu-Hang "Maxin" Tang
|
83ca9ae47b
|
Mark invariant normalizer in Gemma as non-persistent (#19788)
Signed-off-by: Yu-Hang Tang <Tang.Maxin@gmail.com>
|
2025-06-18 22:56:03 -07:00 |
|
Maximilien de Bayser
|
799397ee4f
|
Support embedding models in V1 (#16188)
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Signed-off-by: Max de Bayser <maxdebayser@gmail.com>
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>
Co-authored-by: 22quinn <33176974+22quinn@users.noreply.github.com>
|
2025-06-18 21:36:33 -07:00 |
|
Chen Zhang
|
a89209b78d
|
[v1] Support mamba2 (#19327)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
|
2025-06-18 20:34:15 +00:00 |
|
Isotr0py
|
ca94d7fa00
|
[Bugfix] Update multimodel models mapping to fit new checkpoint after Transformers v4.52 (#19151)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-06-17 15:58:38 +00:00 |
|
qscqesze
|
387bdf0ab9
|
[Model] Add support for MiniMaxM1ForCausalLM (shares architecture with MiniMaxText01ForCausalLM) (#19677)
Signed-off-by: QscQ <qscqesze@gmail.com>
|
2025-06-16 09:47:14 -07:00 |
|
wang.yuqi
|
f40f763f12
|
[CI] Add mteb testing for rerank models (#19344)
|
2025-06-16 01:36:43 -07:00 |
|
Ning Xie
|
2f1c19b245
|
[CI] change spell checker from codespell to typos (#18711)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
|
2025-06-11 19:57:10 -07:00 |
|
Michael Goin
|
1e473b3010
|
[CI] Disable failing GGUF model test (#19454)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-06-11 05:12:38 +00:00 |
|
wang.yuqi
|
3952731e8f
|
[New Model]: Support Qwen3 Embedding & Reranker (#19260)
|
2025-06-10 20:07:30 -07:00 |
|
Luka Govedič
|
2d8476e465
|
[BugFix][V1] Fix memory profiling bug (#18974)
Signed-off-by: luka <luka@neuralmagic.com>
|
2025-06-07 10:34:51 -07:00 |
|
Isotr0py
|
d2f0e7e615
|
[CI/Build] Improve Llama GGUF test robustness (#19287)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-06-07 17:23:28 +08:00 |
|
Luis Vega
|
cb6d572e85
|
[Model] NemotronH support (#18863)
Signed-off-by: Luis Vega <2478335+vegaluisjose@users.noreply.github.com>
Co-authored-by: Luis Vega <2478335+vegaluisjose@users.noreply.github.com>
|
2025-06-05 21:29:28 +00:00 |
|
Cyrus Leung
|
01dc9a76db
|
[CI/Build][Bugfix] Ensure compatibility with transformers 4.52 (#18678)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-06-04 04:49:20 -07:00 |
|
wang.yuqi
|
35cf32df30
|
Improve the output precision of embedding models (#19092)
|
2025-06-04 11:48:57 +00:00 |
|
Li, Jiang
|
4555143ea7
|
[CPU] V1 support for the CPU backend (#16441)
|
2025-06-03 18:43:01 -07:00 |
|
Simon Mo
|
02f0c7b220
|
[Misc] Add SPDX-FileCopyrightText (#19100)
Signed-off-by: simon-mo <simon.mo@hey.com>
|
2025-06-03 11:20:17 -07:00 |
|
汪志鹏
|
1282bd812e
|
Add tarsier model support (#18985)
Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
|
2025-06-03 13:13:13 +08:00 |
|
Cyrus Leung
|
6aa8f9a4e7
|
[Core] Rework dtype resolution (#18751)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-06-01 11:04:23 +08:00 |
|
Shawn Huang
|
e1fadf1197
|
[Feature] minicpm eagle support (#18943)
Signed-off-by: huangyuxiang03 <huangyx0321@gmail.com>
Co-authored-by: huangyuxiang03 <huangyx0321@gmail.com>
|
2025-05-30 06:45:56 -07:00 |
|
Isotr0py
|
c9479b2920
|
[Bugfix] Fix the failing gte embedding test (#18720)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-05-29 07:39:25 -07:00 |
|
wang.yuqi
|
de65fc8e1e
|
[CI] improve embed testing (#18747)
|
2025-05-28 00:16:35 -07:00 |
|
wang.yuqi
|
3e9ce609bd
|
[Bugfix] Fix nomic max_model_len (#18755)
|
2025-05-27 20:29:53 -07:00 |
|
Isotr0py
|
1f1b1bc03b
|
[V1][Quantization] Add CUDA graph compatible v1 GGUF support (#18646)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-05-27 04:40:28 +00:00 |
|