Lucas Wilkinson
|
70406eb1dc
|
[Attention][V0 Deprecation] Deprecate accept output buffer (#39125)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2026-04-07 17:14:58 -04:00 |
|
Yubo Wang
|
08bfedc152
|
[Bugfix] Fix extract_hidden_states crash with quantized KV cache dtype (#39160)
Signed-off-by: Yubo Wang <yubowang2019@gmail.com>
|
2026-04-07 11:18:33 -07:00 |
|
Rishapveer Singh
|
da4c0e4db9
|
[Model] Use AutoWeightsLoader for FalconH1 (#39092)
Signed-off-by: Rishapveer Singh <215205492+rishaps@users.noreply.github.com>
|
2026-04-07 16:25:17 +08:00 |
|
Netanel Haber
|
a9a0e0551f
|
nano-nemotron-vl: get_mm_max_tokens_per_item for audio, video, image == seq_len (#38727)
Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>
|
2026-04-07 00:23:29 -07:00 |
|
Netanel Haber
|
dfa5062a8f
|
NemotronH default mamba_ssm_cache_dtype=float32; enable auto-hook for NemotronHNanoVLV2Config (#39032)
Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>
|
2026-04-06 19:47:46 +00:00 |
|
bnellnm
|
93bada494f
|
[MoE Refactor] Split of DefaultMoERunner class (#35326)
Signed-off-by: Bill Nell <bnell@redhat.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
|
2026-04-06 12:41:59 -04:00 |
|
Wentao Ye
|
4ae218c122
|
[Refactor] Remove unused dead code (#38842)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2026-04-06 11:52:05 -04:00 |
|
Lucas Wilkinson
|
47e605092b
|
[Gemma4] Enable Fast Prefill Optimization (#38879)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2026-04-06 11:19:39 -04:00 |
|
bhargav-patel-29
|
c5e3454e5a
|
[Model] Add support for BharatGen's Param2MoE model (#38000)
Signed-off-by: bhargav-patel-29 <bhargav.patel@tihiitb.org>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2026-04-06 16:19:56 +08:00 |
|
liuchenbing2026
|
f6983f01de
|
MiniMax-M2: add Eagle3 speculative decoding support (#37512)
Signed-off-by: liuchenbing <chenliumail@163.com>
Signed-off-by: liucb <liuchengbao_work@163.com>
Co-authored-by: liuchenbing <chenliumail@163.com>
|
2026-04-05 19:50:18 -07:00 |
|
Netanel Haber
|
d56e952239
|
nano_nemotron_vl: fix tensor device mismatch exception when video profiling (#39029)
Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>
|
2026-04-05 22:23:45 +00:00 |
|
Greg Pereira
|
4dd49b06f8
|
[Bug] Fix Import paths for encoder_cudagraph modules (#38997)
Signed-off-by: greg pereira <grpereir@redhat.com>
Signed-off-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
|
2026-04-05 19:11:58 +00:00 |
|
lalit10
|
93726b2a1c
|
Refactor Arctic loading to use AutoWeightsLoader (#38955)
Signed-off-by: Lalit Laxminarayan Bangad <lalitbangad@gmail.com>
Co-authored-by: Lalit Laxminarayan Bangad <lalitbangad@meta.com>
|
2026-04-04 05:01:09 +00:00 |
|
Yongye Zhu
|
8617f8676b
|
[Bugfix] Fix DSV32 weight loading (#38870)
Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>
|
2026-04-03 19:57:52 -07:00 |
|
elenalil-aws
|
81994e1d0e
|
[Bugfix][LoRA] Fix missing in_proj_z in Qwen3_5ForConditionalGenerati… (#38927)
Signed-off-by: elenalil-aws <elenalil@amazon.com>
|
2026-04-03 23:30:09 +00:00 |
|
Netanel Haber
|
fa9e68022d
|
Fix Nano Nemotron VL regressions (#38655)
Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>
|
2026-04-03 15:22:06 +08:00 |
|
Isotr0py
|
5506435419
|
[Misc] Clean up Gemma4 implementation (#38872)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2026-04-03 05:47:02 +00:00 |
|
Varun Sundar Rabindranath
|
2ad7c0335f
|
[Model] Add Phi4ForCausalLMV for microsoft/Phi-4-reasoning-vision-15B (#38306)
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
|
2026-04-02 21:14:57 -07:00 |
|
Vadim Gimpelson
|
771913e4a0
|
[Bugfix] Fix NVFP4+MTP crash: force unquantized mtp.fc for Qwen3.5 (#38832)
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>
|
2026-04-03 04:45:57 +04:00 |
|
1096125073
|
71a9125c67
|
[New Model]: add support for telechat3 (#38510)
Signed-off-by: xiayongqiang <xiayq1@chinatelecom.cn>
Co-authored-by: xiayongqiang <xiayq1@chinatelecom.cn>
|
2026-04-03 08:26:22 +08:00 |
|
Nicolò Lucchesi
|
66e86f1dbd
|
[Kernel] Mamba support different layout for Conv state (#37416)
|
2026-04-03 01:50:09 +02:00 |
|
Luciano Martins
|
08ed2b9688
|
feat(models): implement Google Gemma 4 architecture support (MoE, Multimodal, Reasoning, Tool-Use) (#38826)
Signed-off-by: Luciano Martins <lucianommartins@users.noreply.github.com>
Signed-off-by: Luciano Martins <lucianomartins@google.com>
Co-authored-by: Luciano Martins <lucianommartins@users.noreply.github.com>
Co-authored-by: Isotr0py <2037008807@qq.com>
|
2026-04-02 11:13:28 -07:00 |
|
bsliu
|
c0817e4d39
|
[Model] Add support for Cheers multimodal model (#38788)
Signed-off-by: bsliu <1187291748@qq.com>
Signed-off-by: 吴炳贤 <wubingxian24@mails.ucas.ac.cn>
|
2026-04-02 21:01:40 +08:00 |
|
Harry Mellor
|
dfe5e31689
|
Don't compile vision encoder for Transformers backend (#30518)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-04-02 12:42:29 +00:00 |
|
Xin Yang
|
9bd7231106
|
Revert "[Kernel] Add gpt-oss Router GEMM kernel (#37205)" (#38778)
Signed-off-by: Xin Yang <xyangx@amazon.com>
|
2026-04-01 22:02:32 -07:00 |
|
Benjamin Chislett
|
5f96f9aff1
|
[Perf] DSV3.2 Indexer Fused Weights Projection (#38684)
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>
|
2026-04-02 03:34:49 +00:00 |
|
bnellnm
|
7cf56a59a2
|
[MoE Refactor] Make SharedExperts class for use with DefaultMoERunner (#35153)
Signed-off-by: Bill Nell <bnell@redhat.com>
|
2026-04-01 09:44:08 -04:00 |
|
Zhanda Zhu
|
c75a313824
|
[Perf] triton bilinear_pos_embed kernel for ViT (#37948)
Signed-off-by: Zhanda Zhu <zhandazhu@gmail.com>
|
2026-04-01 01:52:02 -07:00 |
|
Lukas Geiger
|
4f6eed3bd4
|
[Core] Simplify multimodal masking (#34246)
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>
|
2026-04-01 01:18:22 -07:00 |
|
Matthew Bonanni
|
116f4be405
|
[1/N][Cleanup] Standardize on use of is_quantized_kv_cache (#38659)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
|
2026-04-01 04:08:01 +00:00 |
|
zhang-prog
|
b6e636c12c
|
[Fix] handle PaddleOCR-VL image processor max_pixels across Transformers v4/v5 (#38629)
Signed-off-by: zhangyue66 <zhangyue66@baidu.com>
|
2026-03-31 15:50:41 +00:00 |
|
Netanel Haber
|
e812bf70bd
|
Restore non-hf processor path for Nano-Nemotron-VL (bypass call_hf_processor_mm_only) - fixes #38018 (#38567)
Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>
Co-authored-by: tomeras91 <57313761+tomeras91@users.noreply.github.com>
|
2026-03-30 21:56:52 +00:00 |
|
Benjamin Chislett
|
494636b29d
|
[Feat][Spec Decode] DFlash (#36847)
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>
|
2026-03-30 15:03:15 -04:00 |
|
Chendi.Xue
|
3b1dbaad4e
|
[HMA]Fix corner case when hybrid page_size can not be evenly divided issue (blk_size=64,tp=4) (#37467)
Signed-off-by: Chendi Xue <chendi.xue@intel.com>
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: Chendi.Xue <chendi.xue@intel.com>
Co-authored-by: Matthew Bonanni <mbonanni@redhat.com>
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>
|
2026-03-30 16:47:30 +00:00 |
|
roikoren755
|
8e6293e838
|
[Mamba] Add stochastic rounding support (#35753)
Signed-off-by: Roi Koren <roik@nvidia.com>
|
2026-03-30 12:33:49 -04:00 |
|
Jee Jee Li
|
ac30a8311e
|
[Bugfix][Model] Fix PixtralForConditionalGeneration LoRA (#36963)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
|
2026-03-29 23:59:42 -07:00 |
|
PikaPikachu
|
63babd17f1
|
[Model][Quantization] Add GGUF support for MiniMax-M2.1 (#36965)
Signed-off-by: kangletian <Letian.Kang@amd.com>
|
2026-03-30 14:24:06 +08:00 |
|
Wentao Ye
|
995dea1354
|
[Perf] Remove redundant device copies for CPU-only pooling token IDs, 48.9% E2E throughput improvement (#38139)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2026-03-29 18:12:50 +00:00 |
|
allgather
|
8c0b6267d7
|
[Transformers v5] fix missing pixtral/voxtral multimodal dispatch (#38410)
Signed-off-by: allgather <all2allops@gmail.com>
|
2026-03-29 09:59:06 +00:00 |
|
haosdent
|
d39b8daf5f
|
[Feature] Add Qwen3-ForcedAligner support via token classification pooling (#35367)
Signed-off-by: haosdent <haosdent@gmail.com>
|
2026-03-29 00:27:52 +00:00 |
|
Xiaoshuang Wang
|
a8eab8f30d
|
[Model] Extract GatedDeltaNetAttention into shared layer for Qwen3Next and Qwen3.5 (#37975)
Signed-off-by: wxsIcey <1790571317@qq.com>
Signed-off-by: Icey <1790571317@qq.com>
|
2026-03-27 14:13:21 +08:00 |
|
Chuan (Richard) Li
|
cb2263218e
|
[Bugfix][Minor] Fix potential NameError in mamba backend selector and misc typos (#35886)
Signed-off-by: Li <chuali@amd.com>
|
2026-03-26 11:59:24 -04:00 |
|
zhang-prog
|
0f5b526040
|
[Fix] Remove unused packing_position_embedding from PaddleOCRVL for better checkpoint compatibility (#38232)
Signed-off-by: zhangyue66 <zhangyue66@baidu.com>
|
2026-03-26 15:34:49 +00:00 |
|
Jared Wen
|
757eafcf37
|
[bug-fix] GLM OCR Patch Merger context_dim (#37962)
Signed-off-by: JaredforReal <w13431838023@gmail.com>
|
2026-03-26 05:11:21 -07:00 |
|
Cyrus Leung
|
502c41a8f6
|
[Model] Use helper function to run MM processors with token inputs (where applicable) (#38018)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-03-26 16:44:04 +08:00 |
|
Terry Gao
|
38de822310
|
[Model] Add torch.compile support for InternVL vision encoder (#38049)
Signed-off-by: tianrengao <terrygao87@gmail.com>
|
2026-03-25 23:52:29 -07:00 |
|
Xin Yang
|
9704a5c310
|
Disable dual stream execution of input projection for Qwen3 (#38152)
Signed-off-by: Xin Yang <xyangx@amazon.com>
|
2026-03-26 01:20:39 +00:00 |
|
Wei Zhao
|
74056039b7
|
Fix minimax m2.5 nvfp4 kv scales weight loading (#37214)
Signed-off-by: wzhao18 <wzhao18.sz@gmail.com>
|
2026-03-26 00:48:06 +00:00 |
|
Harry Mellor
|
3c3c084240
|
Various Transformers v5 fixes (#38127)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-03-26 00:10:08 +00:00 |
|
Ekagra Ranjan
|
7b54f60db0
|
[Cohere] Enable Cohere-Transcribe (#38120)
Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>
|
2026-03-25 16:13:51 -07:00 |
|