Fynn Schmitt-Ulms
9433acb8df
[Spec Decode] Add hidden states extraction system ( #33736 )
...
Signed-off-by: Fynn Schmitt-Ulms <fschmitt@redhat.com >
2026-03-02 14:29:09 -05:00
Isotr0py
cc0d565f40
[CI/Build] Enable Qwen3.5 tests on CI ( #35763 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2026-03-02 17:43:53 +00:00
Andreas Karatzas
c34963f138
[ROCm][CI] Disable skinny GEMMs in language model standard tests to fix non-determinism ( #35152 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-03-02 15:04:18 +08:00
Itay Alroy
dea268336f
[1/N] Elastic EP Milestone 2 ( #34861 )
...
Signed-off-by: Yongji Wu <wuyongji317@gmail.com >
Signed-off-by: Itay Alroy <ialroy@nvidia.com >
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com >
Signed-off-by: Ron Tourgeman <rtourgeman@nvidia.com >
Co-authored-by: Yongji Wu <wuyongji317@gmail.com >
Co-authored-by: Tyler Michael Smith <tlrmchlsmth@gmail.com >
Co-authored-by: Ron Tourgeman <rtourgeman@nvidia.com >
2026-02-28 04:46:42 +00:00
fort726
905d76b51d
[Model] Add huggingface skt/A.X-K1 model ( #32407 )
...
Signed-off-by: Sungwan(Alex) Kim <sw0726.kim@sktelecom.com >
Signed-off-by: fort726 <38447663+fort726@users.noreply.github.com >
Co-authored-by: Sungwan(Alex) Kim <sw0726.kim@sktelecom.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: TJian <tunjian.tan@embeddedllm.com >
2026-02-27 09:26:02 -08:00
Yueqian Lin
e8249378e4
[Bugfix] Fix check_interleaved_audio_video false positive for batched non-interleaved requests ( #35487 )
...
Signed-off-by: linyueqian <linyueqian@outlook.com >
Co-authored-by: Roger Wang <hey@rogerw.io >
2026-02-27 06:48:25 -08:00
Jakub Zakrzewski
111d869069
[Model] Add nvidia/llama-nemotron-embed-vl-1b-v2 multimodal embedding model ( #35297 )
...
Signed-off-by: Jakub Zakrzewski <jzakrzewski@nvidia.com >
2026-02-26 14:17:17 +00:00
Cyrus Leung
845ee348ef
[Misc] Standardize handling of mm_processor_kwargs.size ( #35284 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-02-26 13:05:46 +00:00
Yueqian Lin
c0615a296d
[Bugfix] Fix Qwen2.5-Omni and Qwen3-Omni mixed-modality embed regression ( #35368 )
...
Signed-off-by: linyueqian <linyueqian@outlook.com >
2026-02-26 11:58:23 +00:00
Jiangyun Zhu
ab87f85231
[Model] Ring 2.5 ( #35102 )
...
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com >
2026-02-26 02:17:11 -08:00
Andreas Karatzas
8fd6975479
[ROCm][CI] Disable skinny GEMMs in multimodal tests to fix non-deterministic results ( #35049 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-02-25 16:48:37 +00:00
Harry Mellor
f7967577f5
Remove requirement to use --hf-overrides for DeepseekVLV2ForCausalLM ( #35203 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-02-24 22:00:06 -08:00
Benjamin Chislett
f5972a872f
[Model][Spec Decode] Nemotron-H MTP and Mamba Speculative Decoding Support ( #33726 )
...
Signed-off-by: Shahar Mor <smor@nvidia.com >
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com >
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
Co-authored-by: Shahar Mor <smor@nvidia.com >
Co-authored-by: Roi Koren <roik@nvidia.com >
Co-authored-by: Lucas Wilkinson <lwilkins@redhat.com >
2026-02-24 09:49:56 -08:00
Isotr0py
d12d201409
[Bugfix] Fix failing FunASR processor test ( #35111 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2026-02-24 04:13:45 -08:00
eustlb
b3ad37c5db
[glm-asr] change defaults dummy audio size ( #35108 )
...
Signed-off-by: Eustache Le Bihan <eulebihan@gmail.com >
2026-02-24 04:13:33 -08:00
Cyrus Leung
392645454b
[Refactor] Decouple TimingContext from InputProcessingContext ( #35083 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-02-23 14:15:50 +00:00
Eldar Kurtić
1e8438a89a
[Llama4,CI] Bring back Llama-4 bug fixes, and also fix Maverick tests ( #35033 )
...
Signed-off-by: Eldar Kurtic <you@example.com >
Co-authored-by: Eldar Kurtic <you@example.com >
2026-02-23 09:04:34 -05:00
Neil Schemenauer
54e2f83d0a
[Feature] Lazy import for the "mistral" tokenizer module. ( #34651 )
...
Signed-off-by: Neil Schemenauer <nas@arctrix.com >
2026-02-23 00:43:01 -08:00
Cyrus Leung
987506bca6
[Refactor] Simplify dummy data generation ( #35025 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-02-22 20:55:27 -08:00
Athrael Soju
970861ac0c
[New Model] Add ColModernVBERT ( #34558 )
...
Signed-off-by: Athrael Soju <athrael.soju@gmail.com >
Signed-off-by: athrael-soju <athrael-soju@users.noreply.github.com >
2026-02-22 12:23:41 +08:00
Wentao Ye
d24bdd7c4b
[CI] Bump mteb version to mteb[bm25s]>=2, <3 for pooling model unit tests ( #34961 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2026-02-21 20:23:24 -08:00
Roman
98b0205c3c
[Frontend] Add automatic language detection for Whisper transcription ( #34342 )
...
Signed-off-by: space_check <roman.vuskov@rwth-aachen.de >
Signed-off-by: Roman <45857014+spacecheck@users.noreply.github.com >
Co-authored-by: Nicolò Lucchesi <nicolo.lucchesi@gmail.com >
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com >
2026-02-21 04:49:41 -08:00
BADAOUI Abdennacer
8dc8a99b56
[ROCm] Enable bitsandbytes quantization support on ROCm ( #34688 )
...
Signed-off-by: badaoui <abdennacerbadaoui0@gmail.com >
2026-02-21 00:34:55 -08:00
Andreas Karatzas
89358f0d35
[CI] Fix ColBERT HF comparison tests on AMD CI + refactor ( #34567 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-02-20 20:12:05 -08:00
Kata Coder
5719a4e4e6
[Frontend] Support multimodal inputs for late-interaction scoring (ColQwen3) + NewModel: nvidia/nemotron-colembed ( #34574 )
...
Signed-off-by: craftsangjae <craftsangjae@gmail.com >
2026-02-20 20:01:40 -08:00
pougetat
11be2c74dc
[Realtime] Add Qwen3-ASR realtime streaming support ( #34613 )
...
Signed-off-by: Thomas Pouget-Abadie <thomaspou@microsoft.com >
Co-authored-by: Thomas Pouget-Abadie <thomaspou@microsoft.com >
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com >
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com >
2026-02-20 19:59:42 -08:00
Lucas Wilkinson
aaefc58ee0
[CI] Revert PRs 34818 and 33600 ( #34979 )
2026-02-20 13:25:50 -08:00
Matthew Bonanni
662205d34e
[Bugfix] Fix Basic Models Test ( #34818 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com >
2026-02-19 14:49:07 -08:00
Andreas Karatzas
cef65f0715
[ROCm][CI] Removed hard-coded attn backend requirement for Qwen VL ( #34753 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-02-18 03:59:53 +00:00
almayne
6bd6d0c3c1
Fixed whisper CPU test that does not spawn properly. ( #34324 )
...
Signed-off-by: Anna Mayne <anna.mayne@arm.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2026-02-17 06:46:23 -08:00
Cyrus Leung
574fe75245
[Renderer] Move InputPreprocessor into Renderer (2/2) ( #34560 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-02-17 05:29:01 -08:00
Isotr0py
91ac5d9bfd
[CI/Build] Enable tests for recent day-0 new models ( #34585 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2026-02-15 18:17:04 -08:00
Christian Pinto
342a7cda2d
[Misc] Update tests and examples for Prithvi/Terratorch models ( #34416 )
...
Signed-off-by: Christian Pinto <christian.pinto@ibm.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2026-02-13 23:03:51 -08:00
Kata Coder
d1ea65d0a1
[new model] add COLQwen3 code & Inference ( #34398 )
...
Signed-off-by: craftsangjae <craftsangjae@gmail.com >
Signed-off-by: katacoder <craftsangjae@gmail.com >
2026-02-14 12:15:19 +08:00
Andreas Karatzas
de42abb366
[CI] Heavy refactoring of Voxtral multimodal audio model tests ( #34294 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-02-13 20:04:29 -08:00
Ilya Boytsov
071d863e20
Extend ColBERT support to non-standard BERT backbones ( #34170 )
...
Signed-off-by: Ilya Boytsov <ilya.boytsov@aleph-alpha.com >
2026-02-13 09:53:09 +00:00
myselvess
bcf0731aa0
[New Model] support new model ovis2.6 ( #34426 )
...
Signed-off-by: myselvess <23743269+myselvess@users.noreply.github.com >
2026-02-13 00:12:45 -08:00
Cyrus Leung
1b4e8e53f8
[CI/Build] Fix CUDA re-initialization error in distributed model tests ( #34491 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-02-13 06:43:53 +00:00
Cyrus Leung
372b2e762a
[Bugfix] Standardize getting number of image patches/tokens ( #34358 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-02-12 20:47:01 -08:00
Patrick von Platen
6c0baee610
[Voxtral Realtime] Refactor & Improve buffering logic ( #34428 )
...
Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-02-12 09:46:43 -08:00
Patrick von Platen
1100a97621
[Voxstral Realtime] Enable tests ( #33803 )
...
Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com >
2026-02-12 09:43:24 -08:00
Isotr0py
becbe24808
[Bugfix] Remove broken raw url GGUF model loading support ( #34433 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2026-02-12 09:40:01 -08:00
Cyrus Leung
b96f7314b4
[Refactor] Pass Renderer to Input Processor ( #34329 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-02-11 19:38:11 -08:00
Raushan Turganbay
527ca32197
[Bugfix] Fix more multimodal tests for transformers V5 ( #34334 )
...
Signed-off-by: raushan <raushan@huggingface.co >
2026-02-11 22:02:05 +01:00
Harry Mellor
67a42b5a44
Don't try and run GLM-ASR with remote code ( #34352 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-02-11 08:09:40 -08:00
AllenDou
21dfb842d7
[model] support FunASR model ( #33247 )
...
Signed-off-by: zixiao <shunli.dsl@alibaba-inc.com >
Co-authored-by: zixiao <shunli.dsl@alibaba-inc.com >
2026-02-11 07:37:09 +00:00
xuebwang-amd
b129136c7a
[ROCm][Quantization] GPT_OSS in amd-quark format model loading and emulations ( #29008 )
...
Signed-off-by: xuebwang-amd <xuebwang@amd.com >
Signed-off-by: Robert Shaw <robshaw@redhat.com >
Co-authored-by: Robert Shaw <robshaw@redhat.com >
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com >
2026-02-10 10:08:05 -05:00
Chen Zhang
97fa8f6590
[BugFix] Avoid prefix cache hit in the same schedule step for mamba layers ( #29387 )
...
Signed-off-by: Chen Zhang <zhangch99@outlook.com >
2026-02-10 07:41:16 +00:00
JJJYmmm
9562912cea
[MODEL] Adding Support for Qwen3.5 Models ( #34110 )
...
Signed-off-by: JJJYmmm <1650675829@qq.com >
Signed-off-by: JJJYmmm <92386084+JJJYmmm@users.noreply.github.com >
Signed-off-by: Roger Wang <hey@rogerw.io >
Co-authored-by: wulipc <wulipc@users.noreply.github.com >
Co-authored-by: ywang96 <ywang96@users.noreply.github.com >
Co-authored-by: Isotr0py <Isotr0py@users.noreply.github.com >
Co-authored-by: Isotr0py <2037008807@qq.com >
Co-authored-by: Roger Wang <hey@rogerw.io >
2026-02-09 21:12:58 +08:00
Andreas Karatzas
3025b3cebb
[CI] Remove empty image_size_factors for fuyu, glm4_1v, glm_ocr ( #34107 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-02-09 17:37:04 +08:00