Commit Graph

2492 Commits

Author SHA1 Message Date
zhang-prog
b6e636c12c [Fix] handle PaddleOCR-VL image processor max_pixels across Transformers v4/v5 (#38629)
Signed-off-by: zhangyue66 <zhangyue66@baidu.com>
2026-03-31 15:50:41 +00:00
Netanel Haber
e812bf70bd Restore non-hf processor path for Nano-Nemotron-VL (bypass call_hf_processor_mm_only) - fixes #38018 (#38567)
Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>
Co-authored-by: tomeras91 <57313761+tomeras91@users.noreply.github.com>
2026-03-30 21:56:52 +00:00
Benjamin Chislett
494636b29d [Feat][Spec Decode] DFlash (#36847)
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>
2026-03-30 15:03:15 -04:00
Chendi.Xue
3b1dbaad4e [HMA]Fix corner case when hybrid page_size can not be evenly divided issue (blk_size=64,tp=4) (#37467)
Signed-off-by: Chendi Xue <chendi.xue@intel.com>
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: Chendi.Xue <chendi.xue@intel.com>
Co-authored-by: Matthew Bonanni <mbonanni@redhat.com>
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>
2026-03-30 16:47:30 +00:00
roikoren755
8e6293e838 [Mamba] Add stochastic rounding support (#35753)
Signed-off-by: Roi Koren <roik@nvidia.com>
2026-03-30 12:33:49 -04:00
Jee Jee Li
ac30a8311e [Bugfix][Model] Fix PixtralForConditionalGeneration LoRA (#36963)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
2026-03-29 23:59:42 -07:00
PikaPikachu
63babd17f1 [Model][Quantization] Add GGUF support for MiniMax-M2.1 (#36965)
Signed-off-by: kangletian <Letian.Kang@amd.com>
2026-03-30 14:24:06 +08:00
Wentao Ye
995dea1354 [Perf] Remove redundant device copies for CPU-only pooling token IDs, 48.9% E2E throughput improvement (#38139)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2026-03-29 18:12:50 +00:00
allgather
8c0b6267d7 [Transformers v5] fix missing pixtral/voxtral multimodal dispatch (#38410)
Signed-off-by: allgather <all2allops@gmail.com>
2026-03-29 09:59:06 +00:00
haosdent
d39b8daf5f [Feature] Add Qwen3-ForcedAligner support via token classification pooling (#35367)
Signed-off-by: haosdent <haosdent@gmail.com>
2026-03-29 00:27:52 +00:00
Xiaoshuang Wang
a8eab8f30d [Model] Extract GatedDeltaNetAttention into shared layer for Qwen3Next and Qwen3.5 (#37975)
Signed-off-by: wxsIcey <1790571317@qq.com>
Signed-off-by: Icey <1790571317@qq.com>
2026-03-27 14:13:21 +08:00
Chuan (Richard) Li
cb2263218e [Bugfix][Minor] Fix potential NameError in mamba backend selector and misc typos (#35886)
Signed-off-by: Li <chuali@amd.com>
2026-03-26 11:59:24 -04:00
zhang-prog
0f5b526040 [Fix] Remove unused packing_position_embedding from PaddleOCRVL for better checkpoint compatibility (#38232)
Signed-off-by: zhangyue66 <zhangyue66@baidu.com>
2026-03-26 15:34:49 +00:00
Jared Wen
757eafcf37 [bug-fix] GLM OCR Patch Merger context_dim (#37962)
Signed-off-by: JaredforReal <w13431838023@gmail.com>
2026-03-26 05:11:21 -07:00
Cyrus Leung
502c41a8f6 [Model] Use helper function to run MM processors with token inputs (where applicable) (#38018)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2026-03-26 16:44:04 +08:00
Terry Gao
38de822310 [Model] Add torch.compile support for InternVL vision encoder (#38049)
Signed-off-by: tianrengao <terrygao87@gmail.com>
2026-03-25 23:52:29 -07:00
Xin Yang
9704a5c310 Disable dual stream execution of input projection for Qwen3 (#38152)
Signed-off-by: Xin Yang <xyangx@amazon.com>
2026-03-26 01:20:39 +00:00
Wei Zhao
74056039b7 Fix minimax m2.5 nvfp4 kv scales weight loading (#37214)
Signed-off-by: wzhao18 <wzhao18.sz@gmail.com>
2026-03-26 00:48:06 +00:00
Harry Mellor
3c3c084240 Various Transformers v5 fixes (#38127)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2026-03-26 00:10:08 +00:00
Ekagra Ranjan
7b54f60db0 [Cohere] Enable Cohere-Transcribe (#38120)
Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>
2026-03-25 16:13:51 -07:00
Cyrus Leung
ba2f0acc2d [Misc] Reorganize inputs (#35182)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2026-03-25 10:22:54 -07:00
grYe99
7ac48fd357 [Model] Add AutoWeightsLoader support for jais (#38074)
Signed-off-by: grYe99 <guorongye99@gmail.com>
Co-authored-by: grYe99 <guorongye99@gmail.com>
2026-03-25 12:38:40 +00:00
Harry Mellor
d6bb2a9d9a Fix Plamo 2/3 & LFM2 for Transformers v5 (#38090)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2026-03-25 12:29:49 +00:00
Matthias Gehre
a889b7f584 [Bugfix] Pass drafter quant_config to ParallelLMHead in Eagle3 (#37280)
Signed-off-by: Matthias Gehre <matthias.gehre@amd.com>
2026-03-25 11:42:58 +00:00
Nick Cao
935c46dd9b [Model] Add Granite 4.0 1B speech to supported models (#38019)
Signed-off-by: Nick Cao <ncao@redhat.com>
2026-03-24 18:23:41 +00:00
Wentao Ye
c59a132f96 [V0 Deprecation] Refactor kv cache from list to element (#37487)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2026-03-23 20:10:11 -07:00
Yufeng He
ec2280611a [Bugfix] Fix RoBERTa position_ids accumulation on CUDA graph padding (#37884) 2026-03-23 15:15:12 +00:00
Artem Perevedentsev
a16133a0f1 [Perf] [Bugfix] Fix Triton autotuning in inference for Qwen3.5 (#37338)
Signed-off-by: Artem Perevedentsev <aperevedents@nvidia.com>
2026-03-23 00:37:58 -07:00
Hojin Yang
54ab804e87 [Bugfix] Store Qwen3Next A_log in fp32 (#37810)
Signed-off-by: effortprogrammer <yhjhoward7@gmail.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
2026-03-23 15:36:57 +08:00
r266-tech
02e6efe56d [Bugfix] JAIS: Only apply ALiBi when position_embedding_type='alibi' (#37820)
Co-authored-by: r266-tech <r266-tech@users.noreply.github.com>
2026-03-23 07:36:34 +00:00
Baorun (Lauren) Mu
f85e479e66 [Feature] ViT Full CUDA Graph (#35963)
Signed-off-by: Baorun Mu <bmu@nvidia.com>
2026-03-23 13:01:10 +08:00
Lasha Koroshinadze
e7767eccae Fix AudioFlamingo3/MusicFlamingo HF parity and RoTE handling (#37643)
Signed-off-by: Lasha <26011196+lashahub@users.noreply.github.com>
2026-03-23 10:29:07 +08:00
Netanel Haber
e74c17e153 Enable NemotronHPuzzle + NemotronHMTP (#37803) 2026-03-22 15:13:58 +00:00
Yang Liu
b050700462 [Perf] Optimize glm4.xv VIT (#37779)
Signed-off-by: Yang <lymailforjob@gmail.com>
2026-03-22 06:12:34 +00:00
Isotr0py
c7f98b4d0a [Frontend] Remove librosa from audio dependency (#37058)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2026-03-21 11:36:15 +08:00
Santino Ramos
85f671b8e1 [Model Runner V2] Support Streaming Inputs (#37028)
Signed-off-by: Santino Ramos <elsantinoramos@gmail.com>
2026-03-20 20:42:25 +00:00
Cyrus Leung
37aadf6237 [Model] Update Kimi-K25 and Isaac processors to fit HF-style (#37693)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2026-03-20 18:30:22 +00:00
Le Yang
d7d2b5e405 [Bugfix] Disable --calculate-kv-scales for hybrid GDN/Mamba+Attention… (#37565)
Signed-off-by: Young-Leo <562593859@qq.com>
2026-03-20 18:28:34 +00:00
Rémi Delacourt
aa84e43ccb [Pixtral] Enable Pixtral language model support Eagle3 (#37182)
Signed-off-by: remi <remi@mistral.ai>
2026-03-20 15:50:15 +00:00
Ilya Boytsov
8b6c6b9505 [Model] Add LFM2-ColBERT-350M support (#37528)
Signed-off-by: Ilya Boytsov <ilyaboytsov1805@gmail.com>
2026-03-20 14:57:57 +00:00
wang.yuqi
ed359c497a [Model] Deprecate the score task (this will not affect users). (#37537)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
2026-03-20 08:07:56 +00:00
Wangbei25
0674d1fee7 [PluggableLayer][MM] Add PluggableLayer for CustomQwen2Decoder (#37293)
Signed-off-by: Wangbei25 <wangbei41@huawie.com>
Signed-off-by: Wangbei25 <wangbei41@huawei.com>
Co-authored-by: Wangbei25 <wangbei41@huawie.com>
2026-03-20 06:24:07 +00:00
Cyrus Leung
30108fc8b0 [Model] Refactor Step3-VL processor to HF style (#37579)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2026-03-20 06:05:08 +00:00
Jee Jee Li
8fbe3f303f [Bugfix][LoRA] Fix Qwen35 LoRA (#36976)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2026-03-20 11:09:32 +08:00
Jim Smith
4120a05ff1 Fix AttributeError in Qwen3.5 GDN layers with quantized models (#37448)
Signed-off-by: Jim Smith <jim@joshua8.ai>
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
Co-authored-by: Xin Yang <105740670+xyang16@users.noreply.github.com>
2026-03-19 19:21:14 -04:00
Lucas Kabela
7769b58307 [torch.compile][BE][Multimodal] Remove requirement to set_model_tag to avoid cache conflict (#37345)
Signed-off-by: Lucas Kabela <lucaskabela@meta.com>
2026-03-19 17:26:12 +00:00
Cyrus Leung
657855ab41 [Misc] Cleanup more configs and processors (#37560)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2026-03-19 15:45:23 +00:00
Cyrus Leung
9515c20868 [Misc] Clean up processing logic (#37541)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2026-03-19 13:30:20 +00:00
Cyrus Leung
7a6ebcbfcf [Model] Remove unnecessary get_language_model (#37545)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2026-03-19 20:00:36 +08:00
Cyrus Leung
765e461065 [Bugfix] Fix Nemotron Parse loading (#37407)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2026-03-19 09:55:29 +00:00