Commit Graph

2279 Commits

Author SHA1 Message Date
roikoren755
3b30e61507 [NemotronH] Do not force router to run in fp32 (#34582)
Signed-off-by: Roi Koren <roik@nvidia.com>
2026-02-16 10:15:32 -08:00
Andreas Karatzas
03a8770a6d [ROCm][CI] Fix plugins test group; updating terratorch and dependencies (#34589)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-02-16 07:33:42 -08:00
Isotr0py
3bb4e4311c [Models] Fuse Qwen3.5 GDN's qkvz_proj and ba_proj (#34492)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2026-02-16 07:32:51 -08:00
Cyrus Leung
ec17bdd894 [Renderer] Move InputPreprocessor into Renderer (1.5/2) (#34598)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2026-02-15 23:46:33 -08:00
Isotr0py
91ac5d9bfd [CI/Build] Enable tests for recent day-0 new models (#34585)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2026-02-15 18:17:04 -08:00
Luka Govedič
23d825aba1 [torch.compile] Disable ar-rms fusion for ds3-fp4 & DP, fix CI test (#34392)
Signed-off-by: Luka Govedič <lgovedic@redhat.com>
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-02-15 06:33:57 -08:00
Isotr0py
71cd89264f [MM Encoder] Add Triton ViT attention backend (#32183)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2026-02-15 06:32:47 -08:00
Cyrus Leung
73391a1baa [Renderer] Move InputPreprocessor into Renderer (1/2) (#34510)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2026-02-14 10:14:21 -08:00
Kata Coder
d1ea65d0a1 [new model] add COLQwen3 code & Inference (#34398)
Signed-off-by: craftsangjae <craftsangjae@gmail.com>
Signed-off-by: katacoder <craftsangjae@gmail.com>
2026-02-14 12:15:19 +08:00
Andreas Karatzas
de42abb366 [CI] Heavy refactoring of Voxtral multimodal audio model tests (#34294)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-02-13 20:04:29 -08:00
Wei Zhao
b37b679770 [Feature][Perf] Support Selective CPU Weight Offloading (#34535)
Signed-off-by: wzhao18 <wzhao18.sz@gmail.com>
2026-02-13 20:02:24 -08:00
Harry Huang
c027541eaf [Hybrid] Enable spec decoding in mamba cache align mode (#33705)
Signed-off-by: huanghaoyan.hhy <huanghaoyan.hhy@alibaba-inc.com>
2026-02-13 13:02:28 -08:00
Wei Zhao
59d53066d8 [Feature] Support CPU Offloading without Pytorch Pinned Memory that leads to doubled allocation (#32993)
Signed-off-by: wzhao18 <wzhao18.sz@gmail.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
2026-02-13 08:11:26 -08:00
LoganJane
4a9952ec1b [Bugfix] Add quant_config in ViT of Kimi-K2.5 (#34501)
Signed-off-by: LoganJane <LoganJane73@hotmail.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2026-02-13 16:05:34 +00:00
Roger Wang
5885e330ef [Misc] Port Qwen3.5 Configs (#34512)
Signed-off-by: Roger Wang <hey@rogerw.io>
2026-02-13 05:24:25 -08:00
Ilya Boytsov
071d863e20 Extend ColBERT support to non-standard BERT backbones (#34170)
Signed-off-by: Ilya Boytsov <ilya.boytsov@aleph-alpha.com>
2026-02-13 09:53:09 +00:00
myselvess
bcf0731aa0 [New Model] support new model ovis2.6 (#34426)
Signed-off-by: myselvess <23743269+myselvess@users.noreply.github.com>
2026-02-13 00:12:45 -08:00
Roger Wang
eea3024f43 [Bugfix] Fix mamba state dtype setting for Qwen3-Next and Qwen3.5 (#34489)
Signed-off-by: Roger Wang <hey@rogerw.io>
2026-02-12 22:48:42 -08:00
haosdent
dcf6ee8592 [Bugfix] Fix encoder cache underestimation for GLM-4V/GLM-OCR single image (#34483)
Signed-off-by: haosdent <haosdent@gmail.com>
2026-02-12 21:04:06 -08:00
Cyrus Leung
372b2e762a [Bugfix] Standardize getting number of image patches/tokens (#34358)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2026-02-12 20:47:01 -08:00
LoganJane
62788f99a4 [Bugfix] Delete unused redundant code in Kimi-K2.5 (#34427)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2026-02-12 18:18:42 -08:00
Patrick von Platen
6c0baee610 [Voxtral Realtime] Refactor & Improve buffering logic (#34428)
Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-02-12 09:46:43 -08:00
Patrick von Platen
1100a97621 [Voxstral Realtime] Enable tests (#33803)
Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com>
2026-02-12 09:43:24 -08:00
Harry Mellor
679ca5d8d3 Fix MoE for the Transformers modelling backend (#34436)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2026-02-12 09:29:42 -08:00
AllenDou
386bfe5d08 [bugfix] refactor FunASR's _get_data_parser (#34397)
Signed-off-by: zixiao <shunli.dsl@alibaba-inc.com>
Co-authored-by: zixiao <shunli.dsl@alibaba-inc.com>
2026-02-12 07:26:49 +00:00
Yichuan Wang
80f2ba6ea6 Fix DeepSeek-OCR tensor validation for all size variants (#34085)
Co-authored-by: Cursor <cursoragent@cursor.com>
2026-02-11 22:50:23 -08:00
Michael Goin
ff1f83b056 [Refactor] Replace activation: str with MoEActivation enum (#33843)
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
2026-02-11 17:29:32 -08:00
Raushan Turganbay
527ca32197 [Bugfix] Fix more multimodal tests for transformers V5 (#34334)
Signed-off-by: raushan <raushan@huggingface.co>
2026-02-11 22:02:05 +01:00
elvischenv
83e26c834e [GPT-OSS] Remove unnecessary contiguous (#34337)
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
2026-02-11 15:29:29 -05:00
Eldar Kurtić
11c7ace340 [Bugfix] Enable attn quantization of Llama-4 by correctly permuting scales for rope (int8, fp8) (#34243)
Signed-off-by: Your Name <you@example.com>
Co-authored-by: Your Name <you@example.com>
2026-02-11 13:24:22 -05:00
Xinyu Dong
be7f3d5d20 [Bugfix] fix default is_neox_style is True for deepseek (#34353)
Signed-off-by: dongxinyu03 <dongxinyu03@baidu.com>
2026-02-11 18:20:45 +00:00
Isotr0py
0ab06100f4 [Multimodal] Expose mm_processor_kwargs for DummyInputsBuilder (#34330)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2026-02-11 09:37:40 -08:00
Harry Mellor
0f5e55e7a8 Make JAIS compatible with Transformers v5 (#34264)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2026-02-11 12:30:37 +00:00
Harry Mellor
1e9204bff3 Make Qwen3VL compatible with Transformers v5 (#34262)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: Roger Wang <hey@rogerw.io>
Co-authored-by: Roger Wang <hey@rogerw.io>
2026-02-11 04:13:23 -08:00
Harry Mellor
40b8f55358 [Docs] Reduce time spent generating API docs (#34255)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2026-02-11 02:56:02 -08:00
AllenDou
21dfb842d7 [model] support FunASR model (#33247)
Signed-off-by: zixiao <shunli.dsl@alibaba-inc.com>
Co-authored-by: zixiao <shunli.dsl@alibaba-inc.com>
2026-02-11 07:37:09 +00:00
Roger Wang
0b20469c62 [Bugfix] Fix weight naming in Qwen3.5 (#34313)
Signed-off-by: Roger Wang <hey@rogerw.io>
2026-02-10 21:37:14 -08:00
Roger Wang
9615575afc [Bugfix] Fix mamba cache dtype for Qwen3.5 (#34200)
Signed-off-by: Roger Wang <hey@rogerw.io>
2026-02-10 13:12:31 -08:00
Andy Lo
ae871ca923 Minor cleanup for Voxtral (#34247)
Signed-off-by: Andy Lo <andy@mistral.ai>
2026-02-10 18:18:30 +00:00
xuebwang-amd
b129136c7a [ROCm][Quantization] GPT_OSS in amd-quark format model loading and emulations (#29008)
Signed-off-by: xuebwang-amd <xuebwang@amd.com>
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
2026-02-10 10:08:05 -05:00
tc-mb
e042d7e685 Add flagos in MiniCPM-o (#34126)
Signed-off-by: tc-mb <caitianchi@modelbest.cn>
Signed-off-by: Vincent-Xiao <vincent.xiao.me@gmail.com>
Co-authored-by: Vincent-Xiao <vincent.xiao.me@gmail.com>
2026-02-10 02:51:48 -08:00
Roger Wang
ae4e280602 [Bugfix] Fix FI kernelchunk_gated_delta_rule output shape for Qwen3.5 (#34219)
Signed-off-by: Roger Wang <hey@rogerw.io>
2026-02-10 10:41:24 +00:00
Roger Wang
047a457fa4 [Bugfix] Adopt ChunkGatedDeltaRule for Qwen3.5 (#34198)
Signed-off-by: Roger Wang <hey@rogerw.io>
2026-02-10 03:47:54 +00:00
Artus Krohn-Grimberghe
8fd31f6245 [Bugfix] Voxtral prompt/audio placeholder alignment (#34140)
Signed-off-by: Artus KG <artuskg@gmail.com>
2026-02-09 19:30:38 +00:00
Artus Krohn-Grimberghe
eadb4e868b [Bugfix] Avoid duplicate k-proj weight emission in helper (#34142)
Signed-off-by: Artus KG <artuskg@gmail.com>
2026-02-09 19:17:44 +00:00
Jiangyun Zhu
285bab4752 [Kernel] use flashinfer for gdn prefill (#32846)
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
2026-02-09 12:17:25 -05:00
JJJYmmm
9562912cea [MODEL] Adding Support for Qwen3.5 Models (#34110)
Signed-off-by: JJJYmmm <1650675829@qq.com>
Signed-off-by: JJJYmmm <92386084+JJJYmmm@users.noreply.github.com>
Signed-off-by: Roger Wang <hey@rogerw.io>
Co-authored-by: wulipc <wulipc@users.noreply.github.com>
Co-authored-by: ywang96 <ywang96@users.noreply.github.com>
Co-authored-by: Isotr0py <Isotr0py@users.noreply.github.com>
Co-authored-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
2026-02-09 21:12:58 +08:00
Jee Jee Li
978a37c823 [Model] GLM adaptation (#34124) 2026-02-09 17:32:52 +08:00
Rohan Potdar
de3869bb4d move checks out of unified_kv_cache_update custom op (#33943)
Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>
2026-02-07 05:30:09 -08:00
whx
ce9b3cd3e9 [PluggableLayer][3/N] Apply PluggableLayer to mamba layers. (#33660)
Signed-off-by: whx-sjtu <2952154980@qq.com>
2026-02-07 05:26:05 -08:00