Kata Coder
5719a4e4e6
[Frontend] Support multimodal inputs for late-interaction scoring (ColQwen3) + NewModel: nvidia/nemotron-colembed ( #34574 )
...
Signed-off-by: craftsangjae <craftsangjae@gmail.com >
2026-02-20 20:01:40 -08:00
pougetat
11be2c74dc
[Realtime] Add Qwen3-ASR realtime streaming support ( #34613 )
...
Signed-off-by: Thomas Pouget-Abadie <thomaspou@microsoft.com >
Co-authored-by: Thomas Pouget-Abadie <thomaspou@microsoft.com >
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com >
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com >
2026-02-20 19:59:42 -08:00
Lucas Wilkinson
0e22cd618b
Revert "[Llama4,Quantization] Simplify and generalize logic for Q/K permutations in quantized self-attn layers " ( #34997 )
2026-02-20 17:19:19 -08:00
Wei Zhao
ea5f903f80
Bump Flashinfer Version and Re-enable DeepSeek NVFP4 AR+Norm Fusion ( #34899 )
...
Signed-off-by: wzhao18 <wzhao18.sz@gmail.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2026-02-20 13:37:31 -08:00
tianshu-Michael-yu
ea37530b47
[Models] LFM2: Support LoRA ( #34921 )
...
Co-authored-by: Piotr Mazurek <piotr635@gmail.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2026-02-19 22:07:23 -08:00
Wentao Ye
c683d11c94
[Refactor] Deprecate head_first for chunk_gated_delta_rule ( #34263 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2026-02-19 13:23:49 -05:00
roikoren755
3eff45d793
Revert "[NemotronH] Do not force router to run in fp32 ( #34582 )" ( #34808 )
...
Signed-off-by: Roi Koren <roik@nvidia.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
2026-02-19 09:47:05 -08:00
Robert Shaw
4685a630a2
[Model Bash][DeepSeekR1] Remove Shared Expert Clone ( #34344 )
...
Signed-off-by: Robert Shaw <robshaw@redhat.com >
Co-authored-by: Robert Shaw <robshaw@redhat.com >
2026-02-19 07:56:14 -08:00
Eldar Kurtić
ee1d25f199
[Llama4,Quantization] Simplify and generalize logic for Q/K permutations in quantized self-attn layers ( #34471 )
...
Signed-off-by: Your Name <you@example.com >
Co-authored-by: Your Name <you@example.com >
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com >
2026-02-19 07:55:41 -08:00
Linda
6fff24f30f
[Bugfix] Qwen3.5 kv-scale weight remapping ( #34719 )
...
Signed-off-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com >
2026-02-19 04:13:37 -08:00
Tal Nir
f75b61a9e9
[Voxtral Realtime] Fix engine crash on empty multimodal embeddings ( #34862 )
...
Signed-off-by: Tal Nir <tal@nervexneurotech.com >
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com >
2026-02-18 23:21:47 -08:00
Wei Zhao
7f51e93864
[Bug] Fix DeepSeek V3 weight loading caused by incorrect prefix ( #34876 )
...
Signed-off-by: wzhao18 <wzhao18.sz@gmail.com >
2026-02-18 23:20:30 -08:00
Alex Brooks
4611af1663
[Bugfix] Add Quant Config to Llava Next Projector ( #34847 )
...
Signed-off-by: Alex Brooks <albrooks@redhat.com >
2026-02-18 23:18:23 -08:00
Manrique Vargas
ad5aa6bd9f
fix(docs): fix typos in comments and docstrings ( #34836 )
...
Signed-off-by: machov <mv1742@nyu.edu >
2026-02-18 23:17:41 -08:00
Isotr0py
c0bd8b13da
[Bugfix] Redo Qwen3.5/Qwen3-Next GDN projector fusion ( #34697 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Co-authored-by: JJJYmmm <92386084+JJJYmmm@users.noreply.github.com >
2026-02-18 09:46:53 -08:00
Robert Shaw
6874638bc4
[Model Bash] DeepSeek R1 BF16 Min Latency QKV A GEMM (0.5% E2E Speedup) ( #34758 )
...
Signed-off-by: Robert Shaw <robshaw@redhat.com >
Co-authored-by: Robert Shaw <robshaw@redhat.com >
2026-02-18 07:42:36 -08:00
Michael Goin
909b147197
[Bugfix] Fix prefix creation for Qwen3.5 ( #34723 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2026-02-17 23:39:15 -08:00
Cyrus Leung
a0d8d944e2
[Renderer] Move MM Hash parsing into Renderer ( #34711 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-02-17 19:18:55 -08:00
Cyrus Leung
574fe75245
[Renderer] Move InputPreprocessor into Renderer (2/2) ( #34560 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-02-17 05:29:01 -08:00
Jiangyun Zhu
1d65283e95
Revert "[Models] Fuse Qwen3.5 GDN's qkvz_proj and ba_proj" ( #34683 )
2026-02-17 01:29:27 -08:00
roikoren755
3b30e61507
[NemotronH] Do not force router to run in fp32 ( #34582 )
...
Signed-off-by: Roi Koren <roik@nvidia.com >
2026-02-16 10:15:32 -08:00
Andreas Karatzas
03a8770a6d
[ROCm][CI] Fix plugins test group; updating terratorch and dependencies ( #34589 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-02-16 07:33:42 -08:00
Isotr0py
3bb4e4311c
[Models] Fuse Qwen3.5 GDN's qkvz_proj and ba_proj ( #34492 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2026-02-16 07:32:51 -08:00
Cyrus Leung
ec17bdd894
[Renderer] Move InputPreprocessor into Renderer (1.5/2) ( #34598 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-02-15 23:46:33 -08:00
Isotr0py
91ac5d9bfd
[CI/Build] Enable tests for recent day-0 new models ( #34585 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2026-02-15 18:17:04 -08:00
Luka Govedič
23d825aba1
[torch.compile] Disable ar-rms fusion for ds3-fp4 & DP, fix CI test ( #34392 )
...
Signed-off-by: Luka Govedič <lgovedic@redhat.com >
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-02-15 06:33:57 -08:00
Isotr0py
71cd89264f
[MM Encoder] Add Triton ViT attention backend ( #32183 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2026-02-15 06:32:47 -08:00
Cyrus Leung
73391a1baa
[Renderer] Move InputPreprocessor into Renderer (1/2) ( #34510 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Signed-off-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
2026-02-14 10:14:21 -08:00
Kata Coder
d1ea65d0a1
[new model] add COLQwen3 code & Inference ( #34398 )
...
Signed-off-by: craftsangjae <craftsangjae@gmail.com >
Signed-off-by: katacoder <craftsangjae@gmail.com >
2026-02-14 12:15:19 +08:00
Andreas Karatzas
de42abb366
[CI] Heavy refactoring of Voxtral multimodal audio model tests ( #34294 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-02-13 20:04:29 -08:00
Wei Zhao
b37b679770
[Feature][Perf] Support Selective CPU Weight Offloading ( #34535 )
...
Signed-off-by: wzhao18 <wzhao18.sz@gmail.com >
2026-02-13 20:02:24 -08:00
Harry Huang
c027541eaf
[Hybrid] Enable spec decoding in mamba cache align mode ( #33705 )
...
Signed-off-by: huanghaoyan.hhy <huanghaoyan.hhy@alibaba-inc.com >
2026-02-13 13:02:28 -08:00
Wei Zhao
59d53066d8
[Feature] Support CPU Offloading without Pytorch Pinned Memory that leads to doubled allocation ( #32993 )
...
Signed-off-by: wzhao18 <wzhao18.sz@gmail.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
2026-02-13 08:11:26 -08:00
LoganJane
4a9952ec1b
[Bugfix] Add quant_config in ViT of Kimi-K2.5 ( #34501 )
...
Signed-off-by: LoganJane <LoganJane73@hotmail.com >
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2026-02-13 16:05:34 +00:00
Roger Wang
5885e330ef
[Misc] Port Qwen3.5 Configs ( #34512 )
...
Signed-off-by: Roger Wang <hey@rogerw.io >
2026-02-13 05:24:25 -08:00
Ilya Boytsov
071d863e20
Extend ColBERT support to non-standard BERT backbones ( #34170 )
...
Signed-off-by: Ilya Boytsov <ilya.boytsov@aleph-alpha.com >
2026-02-13 09:53:09 +00:00
myselvess
bcf0731aa0
[New Model] support new model ovis2.6 ( #34426 )
...
Signed-off-by: myselvess <23743269+myselvess@users.noreply.github.com >
2026-02-13 00:12:45 -08:00
Roger Wang
eea3024f43
[Bugfix] Fix mamba state dtype setting for Qwen3-Next and Qwen3.5 ( #34489 )
...
Signed-off-by: Roger Wang <hey@rogerw.io >
2026-02-12 22:48:42 -08:00
haosdent
dcf6ee8592
[Bugfix] Fix encoder cache underestimation for GLM-4V/GLM-OCR single image ( #34483 )
...
Signed-off-by: haosdent <haosdent@gmail.com >
2026-02-12 21:04:06 -08:00
Cyrus Leung
372b2e762a
[Bugfix] Standardize getting number of image patches/tokens ( #34358 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-02-12 20:47:01 -08:00
LoganJane
62788f99a4
[Bugfix] Delete unused redundant code in Kimi-K2.5 ( #34427 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2026-02-12 18:18:42 -08:00
Patrick von Platen
6c0baee610
[Voxtral Realtime] Refactor & Improve buffering logic ( #34428 )
...
Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-02-12 09:46:43 -08:00
Patrick von Platen
1100a97621
[Voxstral Realtime] Enable tests ( #33803 )
...
Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com >
2026-02-12 09:43:24 -08:00
Harry Mellor
679ca5d8d3
Fix MoE for the Transformers modelling backend ( #34436 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-02-12 09:29:42 -08:00
AllenDou
386bfe5d08
[bugfix] refactor FunASR's _get_data_parser ( #34397 )
...
Signed-off-by: zixiao <shunli.dsl@alibaba-inc.com >
Co-authored-by: zixiao <shunli.dsl@alibaba-inc.com >
2026-02-12 07:26:49 +00:00
Yichuan Wang
80f2ba6ea6
Fix DeepSeek-OCR tensor validation for all size variants ( #34085 )
...
Co-authored-by: Cursor <cursoragent@cursor.com >
2026-02-11 22:50:23 -08:00
Michael Goin
ff1f83b056
[Refactor] Replace activation: str with MoEActivation enum ( #33843 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
Signed-off-by: Michael Goin <mgoin64@gmail.com >
2026-02-11 17:29:32 -08:00
Raushan Turganbay
527ca32197
[Bugfix] Fix more multimodal tests for transformers V5 ( #34334 )
...
Signed-off-by: raushan <raushan@huggingface.co >
2026-02-11 22:02:05 +01:00
elvischenv
83e26c834e
[GPT-OSS] Remove unnecessary contiguous ( #34337 )
...
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com >
2026-02-11 15:29:29 -05:00
Eldar Kurtić
11c7ace340
[Bugfix] Enable attn quantization of Llama-4 by correctly permuting scales for rope (int8, fp8) ( #34243 )
...
Signed-off-by: Your Name <you@example.com >
Co-authored-by: Your Name <you@example.com >
2026-02-11 13:24:22 -05:00