Kunshang Ji
|
16d2ad1d38
|
[Hardware] Replace torch.cuda.empty_cache with torch.accelerator.empty_cache (#30681)
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
Signed-off-by: Kunshang Ji <jikunshang95@gmail.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-03-04 09:49:47 +00:00 |
|
Chuan (Richard) Li
|
5dc3538736
|
[ROCm][Bugfix] Fall back from CK MXFP4 MoE when GEMM dimensions are unsupported (#35893)
Signed-off-by: Li <chuali@amd.com>
|
2026-03-04 08:30:54 +00:00 |
|
Nathan Price
|
36bf213181
|
[Bugfix] Add missing dynamic_arg_dims for Qwen3-ASR torch.compile (#35869)
Signed-off-by: Nathan Price <nathan@abridge.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2026-03-04 08:29:01 +00:00 |
|
Andrii Skliar
|
5d199ac8f2
|
Support Audio Extraction from MP4 Video for Nemotron Nano VL (#35539)
Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>
Signed-off-by: Andrii Skliar <askliar@nvidia.com>
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
Signed-off-by: Andrii <askliar@nvidia.com>
Co-authored-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>
Co-authored-by: Andrii Skliar <askliar@oci-nrt-cs-001-vscode-01.cm.cluster>
Co-authored-by: Andrii <askliar@nvidia.com>
Co-authored-by: root <root@pool0-03748.cm.cluster>
Co-authored-by: Roger Wang <hey@rogerw.io>
Co-authored-by: root <root@pool0-02416.cm.cluster>
Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
Co-authored-by: Matthew Bonanni <mbonanni@redhat.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
Co-authored-by: wangxiyuan <wangxiyuan1007@gmail.com>
Co-authored-by: root <root@pool0-04880.cm.cluster>
|
2026-03-03 23:20:33 -08:00 |
|
Andreas Karatzas
|
edba15045a
|
[Bugfix] Guard mm_token_type_ids kwarg in get_mrope_input_positions (#35711)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-04 04:12:51 +00:00 |
|
Isotr0py
|
6e9f21e8a2
|
[Chore] Remove debug code in model implementation (#35883)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2026-03-03 19:50:58 -08:00 |
|
AllenDou
|
c1d963403c
|
[model] support FireRedASR2 (#35727)
Signed-off-by: zixiao <shunli.dsl@alibaba-inc.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: zixiao <shunli.dsl@alibaba-inc.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2026-03-03 19:41:30 -08:00 |
|
Shanshan Shen
|
77e6dcbbfa
|
[PluggableLayer][MM] Add PluggableLayer for RelPosAttention (#33753)
Signed-off-by: shen-shanshan <467638484@qq.com>
|
2026-03-03 19:41:27 -08:00 |
|
William Zhang
|
70c73df69e
|
[Bugfix] Fix EVS implementation for Qwen3 VL (#33607)
Signed-off-by: 2ez4bz <133824995+2ez4bz@users.noreply.github.com>
|
2026-03-04 02:18:11 +00:00 |
|
xjx
|
9a9d442464
|
Enable bnb for multiple indices weight (#35838)
Signed-off-by: xjx <493337577@qq.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2026-03-04 01:46:47 +00:00 |
|
bnellnm
|
a9b8b13e5c
|
[Bugfix] Fix misnamed parameter in compressed_tensors_moe.py (#35813)
Signed-off-by: Bill Nell <bnell@redhat.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
|
2026-03-03 16:29:57 -05:00 |
|
Robert Shaw
|
97995f6376
|
[MoE Refactor] Create MK for TRTLLM Kernels (#32564)
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>
Signed-off-by: Robert Shaw <robertgshaw2@gmail.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <rshaw@neuralmagic.com>
|
2026-03-03 10:39:50 -08:00 |
|
Lucas Wilkinson
|
28ef9ba399
|
[BugFix] Add support for MTP num_speculative_tokens > 1 with sparse MLA (#34552)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Co-authored-by: Matthew Bonanni <mbonanni@redhat.com>
|
2026-03-03 07:21:57 -08:00 |
|
lin-shh
|
35a6f0bfe2
|
[Misc] Fix typos in comments: explict→explicit, paramaters→parameters (#35648)
|
2026-03-02 21:59:14 -08:00 |
|
Isotr0py
|
8ea8ba275e
|
[V0 deprecation] Remove Swin model (#35821)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2026-03-02 20:03:41 -08:00 |
|
Jakub Zakrzewski
|
c8b678e53e
|
[Model] Add support for nvidia/llama-nemotron-rerank-vl-1b-v2 (#35735)
Signed-off-by: Jakub Zakrzewski <jzakrzewski@nvidia.com>
|
2026-03-03 08:32:14 +08:00 |
|
Robert Shaw
|
9319044ee9
|
[MoE][Perf] Wrap DSV3 QKVAProj GEMM in custom op for torch.compile (#35751)
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
|
2026-03-02 23:03:49 +00:00 |
|
Ye (Charlotte) Qi
|
fa6a6be519
|
[Bugfix] Fix missing sequence_lengths in qwen3_omni_moe_thinker (#35741)
Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com>
|
2026-03-02 21:11:56 +00:00 |
|
Fynn Schmitt-Ulms
|
9433acb8df
|
[Spec Decode] Add hidden states extraction system (#33736)
Signed-off-by: Fynn Schmitt-Ulms <fschmitt@redhat.com>
|
2026-03-02 14:29:09 -05:00 |
|
CSWYF3634076
|
2a9e3347e9
|
[BugFix][Model]Fix the garbled code in Ernie4.5-VL caused by fast_moe_cold_start (#35587)
Signed-off-by: wangyafeng <wangyafeng@baidu.com>
|
2026-03-02 18:56:33 +00:00 |
|
Turner Jabbour
|
4034c3d32e
|
[Core] Move test utility to test file (#35672)
Signed-off-by: Turner Jabbour <doubleujabbour@gmail.com>
|
2026-03-02 10:56:03 -05:00 |
|
ElizaWszola
|
d9c7730877
|
[Performance] Extract kv update ops from MLA attention backends (#34627)
Signed-off-by: ElizaWszola <ewszola@redhat.com>
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Di Wu <dw2761@nyu.edu>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
|
2026-03-02 10:43:19 -05:00 |
|
Kunshang Ji
|
92f5d0f070
|
[XPU] fix mxfp4 activation type (#35691)
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
|
2026-03-02 11:48:39 +08:00 |
|
Lucas Wilkinson
|
8b5014d3dd
|
[Attention] FA4 integration (#32974)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
Co-authored-by: Matthew Bonanni <mbonanni@redhat.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
|
2026-03-01 23:44:57 +00:00 |
|
Richard Zou
|
e82fbeec7b
|
[torch.compile] Undo the fast_moe_cold_start hack in torch>=2.11 (#35475)
Signed-off-by: Richard Zou <zou3519@gmail.com>
|
2026-03-01 21:44:22 +00:00 |
|
haosdent
|
6290470843
|
[Bugfix] Fix dtype mismatch in RMSNormGated.forward_native() during torch.compile (#35256)
Signed-off-by: haosdent <haosdent@gmail.com>
|
2026-03-01 15:14:46 -05:00 |
|
Seungho Yoon
|
5a435507d8
|
fix(mxfp4): return is_monolithic=False when LoRA is enabled for Triton backend (#35382)
Signed-off-by: Seungho Yoon <yoonsnowdev@gmail.com>
|
2026-03-01 09:59:30 -05:00 |
|
Asaf Gardin
|
bbf81f9a92
|
[Mamba1] - Kernel Level Chunk Alignment for Prefix Caching (#34798)
Signed-off-by: Josephasafg <ajgard7@gmail.com>
|
2026-03-01 20:40:23 +08:00 |
|
lin-shh
|
a9ec392c86
|
Fix typo: implictly -> implicitly in isaac.py docstring (#35646)
|
2026-02-28 23:34:37 -08:00 |
|
lailoo
|
afd089f231
|
[Bugfix][Model] Fix Qwen3.5/Qwen3Next ignoring --dtype flag on older GPUs (#35617)
|
2026-03-01 03:27:37 +00:00 |
|
Isotr0py
|
e94b263bd6
|
[Chore] Cleanup BNB utilization dead code (#35620)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2026-02-28 19:22:41 +00:00 |
|
Wentao Ye
|
e113a30113
|
[Deprecation] Deprecate code in 0.17 as scheduled (#35441)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2026-02-28 17:32:37 +00:00 |
|
cwazai
|
63d7972f13
|
Fix Qwen3_5MTP packed_modules_mapping for gate_up_proj (#35581)
|
2026-02-28 14:50:55 +00:00 |
|
Chauncey
|
7e08c22b8c
|
[Feat] Add CUDA torch fallbacks for fp8_mqa_logits/fp8_paged_mqa_logits_torch function (#35271)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2026-02-28 10:12:00 +00:00 |
|
Hashem Hashemi
|
7600642eae
|
Add padding support to wvSplitK solution for skinny GEMMs (#33762)
Signed-off-by: Hashem Hashemi <hashem.hashemi@amd.com>
|
2026-02-28 09:02:05 +00:00 |
|
Itay Alroy
|
dea268336f
|
[1/N] Elastic EP Milestone 2 (#34861)
Signed-off-by: Yongji Wu <wuyongji317@gmail.com>
Signed-off-by: Itay Alroy <ialroy@nvidia.com>
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
Signed-off-by: Ron Tourgeman <rtourgeman@nvidia.com>
Co-authored-by: Yongji Wu <wuyongji317@gmail.com>
Co-authored-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
Co-authored-by: Ron Tourgeman <rtourgeman@nvidia.com>
|
2026-02-28 04:46:42 +00:00 |
|
Matthew Bonanni
|
2562e0271e
|
[MTP] Validate that MTP weights are actually loaded (#35548)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
|
2026-02-28 12:27:40 +08:00 |
|
Douglas Lehr
|
d5b6f3ba36
|
[ROCm][Quantization] Add Composable Kernel (CK) backend support for M… (#34301)
Signed-off-by: Doug Lehr <douglehr@amd.com>
Signed-off-by: Douglas Lehr <91553416+dllehr-amd@users.noreply.github.com>
Signed-off-by: Douglas Lehr <Doug.Lehr@amd.com>
Co-authored-by: Doug Lehr <douglehr@amd.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Rohan Potdar <66227218+Rohan138@users.noreply.github.com>
|
2026-02-28 03:37:01 +00:00 |
|
Jakub Zakrzewski
|
1f3dbd95fd
|
[Bugfix][Model] Fix gpt-oss batch invariance (#35404)
Signed-off-by: Jakub Zakrzewski <jzakrzewski@nvidia.com>
|
2026-02-27 20:41:24 +00:00 |
|
SteadfastAsArt
|
2decec9856
|
[Transformers backend] Ignore MTP weights when num_nextn_predict_layers=0 (#34888)
Signed-off-by: SteadfastAsArt <695488173@qq.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-02-27 19:39:23 +00:00 |
|
Raushan Turganbay
|
fd6de37fca
|
[BugFix] Fix 3D rope in transformers backend (#35097)
Signed-off-by: raushan <raushan@huggingface.co>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-02-27 18:34:49 +00:00 |
|
Netanel Haber
|
c8aca0c9e1
|
Support parakeet as audio encoder for nemotron-nano-vl (#35100)
Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
|
2026-02-27 11:07:38 -07:00 |
|
fort726
|
905d76b51d
|
[Model] Add huggingface skt/A.X-K1 model (#32407)
Signed-off-by: Sungwan(Alex) Kim <sw0726.kim@sktelecom.com>
Signed-off-by: fort726 <38447663+fort726@users.noreply.github.com>
Co-authored-by: Sungwan(Alex) Kim <sw0726.kim@sktelecom.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: TJian <tunjian.tan@embeddedllm.com>
|
2026-02-27 09:26:02 -08:00 |
|
Yueqian Lin
|
e8249378e4
|
[Bugfix] Fix check_interleaved_audio_video false positive for batched non-interleaved requests (#35487)
Signed-off-by: linyueqian <linyueqian@outlook.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
|
2026-02-27 06:48:25 -08:00 |
|
Tib
|
6467b635b6
|
[Bugfix] Add missing activation attr to RMSNormGated (#35423)
Signed-off-by: tibG <naps@qubes.milou>
Co-authored-by: tibG <naps@qubes.milou>
|
2026-02-27 12:53:35 +00:00 |
|
Max Hu
|
9c3fe9936b
|
Flashinfer cuDNN backend for Qwen3 VL ViT attention (#34580)
Signed-off-by: Max Hu <maxhu@nvidia.com>
Signed-off-by: Max Hu <hyoung2991@gmail.com>
Co-authored-by: Max Hu <maxhu@nvidia.com>
Co-authored-by: Shang Wang <shangw@nvidia.com>
|
2026-02-27 20:20:23 +08:00 |
|
Chengyi Nie
|
a572baff5e
|
[Model Performance] Add Qwen3MoE tuned MoE configs for H200 (#35457)
Signed-off-by: Chengyi Nie <cnie@roblox.com>
Co-authored-by: Chengyi Nie <cnie@roblox.com>
|
2026-02-27 13:51:14 +08:00 |
|
zofia
|
516cf26698
|
[Bug] correct out dtype of rms_norm_gated native path (#35369)
Signed-off-by: Zhu, Zufang <zufang.zhu@intel.com>
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
|
2026-02-27 05:19:51 +00:00 |
|
Wentao Ye
|
062b789632
|
[Bug] Fix outdated links in source code (#35314)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2026-02-27 03:50:46 +00:00 |
|
Jee Jee Li
|
1e5ad9b74f
|
[Bugfix] Fix Qwen3NextForCausalLM packed_modules_mapping (#35413)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2026-02-26 19:46:30 -08:00 |
|