Jhao-Ting Chen
|
5573894737
|
Kimi k2.5 MLA based eagle3 (#36361)
Signed-off-by: Izzy Putterman <iputterman@nvidia.com>
Signed-off-by: Jhao-Ting Chen <jhaotingc@nvidia.com>
Co-authored-by: Izzy Putterman <iputterman@nvidia.com>
|
2026-03-11 11:36:11 -04:00 |
|
Martin Hickey
|
700a1ddc65
|
[Misc] Use envs module to get VLLM_DISABLED_KERNELS (#35776)
Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com>
|
2026-03-11 13:37:46 +00:00 |
|
Wuxun Zhang
|
e584dce52b
|
Add XPU MLA Sparse backend for DeepSeek v3.2 (#33230)
Signed-off-by: Zhang, Wuxun <wuxun.zhang@intel.com>
|
2026-03-11 19:19:15 +08:00 |
|
Weiguang Li
|
724759684c
|
[Bugfix] Fix Qwen3-VL timestamp mismatch when using num_frames without fps (#36136)
Signed-off-by: OiPunk <codingpunk@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
|
2026-03-11 03:13:06 -07:00 |
|
Rahul Tuli
|
9d07a3d6e4
|
Add: Eagle3 support for Qwen3.5 (#36658)
Signed-off-by: Rahul-Tuli <rtuli@redhat.com>
|
2026-03-11 03:07:42 -07:00 |
|
Cyrus Leung
|
646b85544b
|
[Refactor] Remove Molmo2 processor wrapper (#36667)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-03-11 03:07:20 -07:00 |
|
tc-mb
|
4286cc5ec2
|
fix(minicpmv): fix audio inference by handling meta device in init_re… (#36751)
Signed-off-by: caitianchi <caitianchi@modelbest.cn>
|
2026-03-11 03:06:28 -07:00 |
|
LoganJane
|
545d18d81b
|
[Bugfix] Support other quantization methods in glm41v (#36321)
Signed-off-by: g00887675/loganJane <g00887675/loganJane73@hotmail.com>
Co-authored-by: g00887675/loganJane <g00887675/loganJane73@hotmail.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2026-03-11 09:48:05 +00:00 |
|
Harry Mellor
|
f4ae58b38b
|
Remove unused config field from Gemma2 (#36672)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-03-11 01:51:19 -07:00 |
|
Kunshang Ji
|
76c6e6da08
|
[XPU] Support block fp8 moe by fallback to TritonExpert on XPU (#36458)
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
|
2026-03-10 21:54:09 -07:00 |
|
Hongbin Guo
|
4bf533623b
|
[Doc] Fix duplicate words in comments (#36713)
Signed-off-by: Hongbin10 <jdmjdm1998@163.com>
|
2026-03-10 21:28:31 -07:00 |
|
tunglinwood
|
42fadebecb
|
[Model] Add support for moonshotai/Kimi-Audio-7B-Instruct (#36127)
Signed-off-by: tunglinwood <tunglinwood@gmail.com>
Signed-off-by: tunglinwood <tomwu.tunglin@gmail.com>
Signed-off-by: tunglinwood <113751333+tunglinwood@users.noreply.github.com>
|
2026-03-10 21:24:48 -07:00 |
|
tianshu-Michael-yu
|
a197eda9c3
|
Add tuned H100 MoE configs for LFM2 8B and 24B (#36699)
|
2026-03-10 21:22:02 -07:00 |
|
Augusto Yao
|
b386bb3d7c
|
fix bugs when token_classify & classify run concurrently (#36614)
Signed-off-by: augusto.yjh <augusto.yjh@antgroup.com>
|
2026-03-10 20:16:34 -07:00 |
|
Wei Zhao
|
84e436ed1c
|
[Bug] Fix TRTLLM Block FP8 MoE Monolithic (#36296)
Signed-off-by: wzhao18 <wzhao18.sz@gmail.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
|
2026-03-10 22:04:47 -04:00 |
|
Hashem Hashemi
|
721ae79f50
|
Improvements to wvSplitKrc skinny GEMM solution (#34304)
Signed-off-by: Hashem Hashemi <hashem.hashemi@amd.com>
|
2026-03-10 09:14:27 -07:00 |
|
AllenDou
|
aefc59f088
|
FunASR model bugfix (#36633)
Signed-off-by: zixiao <shunli.dsl@alibaba-inc.com>
Co-authored-by: zixiao <shunli.dsl@alibaba-inc.com>
|
2026-03-10 08:14:21 -07:00 |
|
wang.yuqi
|
a3189a08b0
|
[Model] Consolidate score logic by introduce score_type (#36479)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
|
2026-03-10 13:32:25 +00:00 |
|
amirkl94
|
156e33553c
|
Fix: Re-Enable EP for trtllm MoE FP8 backend (#36494)
Signed-off-by: Amir Klein <203507526+amirkl94@users.noreply.github.com>
|
2026-03-09 23:11:27 -07:00 |
|
Hojin Yang
|
0836be3b03
|
[Model] Add HyperCLOVAX-SEED-Think-32B vision-language model support (#31471)
Signed-off-by: effortprogrammer <yhjhoward7@gmail.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2026-03-10 10:59:19 +08:00 |
|
Ajay Anubolu
|
4e95ec111c
|
[Bugfix] Fix Qwen3-Next in_proj_ba weight sharding with TP > 1 (#36242)
Signed-off-by: AjAnubolu <anuboluajay@gmail.com>
|
2026-03-09 19:16:26 -07:00 |
|
Lucas Kabela
|
3fd03f1ec2
|
[BE] Rename should_torch_compile_mm_vit to should_torch_compile_mm_encoder (#36281)
Signed-off-by: Lucas Kabela <lucaskabela@meta.com>
|
2026-03-09 18:22:05 +00:00 |
|
Shaun Kotek
|
fa028207aa
|
Fix/resupport nongated fused moe triton (#36412)
Signed-off-by: Shaun Kotek - Nvidia <skotek@nvidia.com>
Signed-off-by: Natan Bagrov <nbagrov@nvidia.com>
Signed-off-by: Daniel Serebrenik <daserebrenik@nvidia.com>
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Signed-off-by: liweiguang <codingpunk@gmail.com>
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Signed-off-by: wang.yuqi <noooop@126.com>
Signed-off-by: Alex Brooks <albrooks@redhat.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: cong-or <conchubhar.gannon@gmail.com>
Signed-off-by: Tushar Shetty <tushar.shetty@abbyy.com>
Signed-off-by: Tushar Shetty <54362365+tusharshetty61@users.noreply.github.com>
Signed-off-by: jiang1.li <jiang1.li@intel.com>
Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com>
Signed-off-by: Xin Yang <xyangx@amazon.com>
Signed-off-by: Kevin H. Luu <khluu000@gmail.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: nvnbagrov <nbagrov@nvidia.com>
Co-authored-by: Sage <80211083+sagearc@users.noreply.github.com>
Co-authored-by: danisereb <daserebrenik@nvidia.com>
Co-authored-by: Jiangyun Zhu <riverclouds.zhu@qq.com>
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Weiguang Li <codingpunk@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Li, Jiang <jiang1.li@intel.com>
Co-authored-by: wang.yuqi <yuqi.wang@daocloud.io>
Co-authored-by: Alex Brooks <albrooks@redhat.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
Co-authored-by: cong-or <conchubhar.gannon@gmail.com>
Co-authored-by: Tushar Shetty <54362365+tusharshetty61@users.noreply.github.com>
Co-authored-by: liuzhenwei <zhenwei.liu@intel.com>
Co-authored-by: Xin Yang <105740670+xyang16@users.noreply.github.com>
Co-authored-by: Kevin H. Luu <khluu000@gmail.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2026-03-09 11:01:18 -07:00 |
|
SoluMilken
|
55d27cca55
|
[Misc] fix typo: dependant -> dependent (2 lines change) (#36511)
Signed-off-by: SoluMilken <ypiheyn.imm02g@g2.nctu.edu.tw>
|
2026-03-09 10:00:12 -07:00 |
|
Taoyu Zhu
|
70485a11bd
|
[ROCM] Optimize the fused_topk_bias to use aiter instead of fallback torch ops. (#36253)
Signed-off-by: zhutaoyu <zhutaoyu97@gmail.com>
|
2026-03-09 11:30:35 -05:00 |
|
Matthew Bonanni
|
77a73458e3
|
Reapply [Attention] Refactor check_and_update_config (#35122)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
|
2026-03-09 07:17:14 -07:00 |
|
Tianyu Guo
|
5578f2a4d3
|
Support online use_audio_in_video (#36319)
Signed-off-by: Tianyu Guo <guoty9@mail2.sysu.edu.cn>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2026-03-09 07:16:44 -07:00 |
|
Xin Yang
|
dc6b578466
|
[Kernel] Add fused_sigmoid_gating_delta_rule_update kernel for Qwen3 Next (#35777)
Signed-off-by: Xin Yang <xyangx@amazon.com>
|
2026-03-08 23:41:01 -07:00 |
|
Cyrus Leung
|
d62856b928
|
[Misc] Move processors to transformers_utils (#35953)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-03-09 11:31:39 +08:00 |
|
Alex Brooks
|
bd2659a566
|
Increase Flexibility for OOV Multimodal Token Handling (#34858)
Signed-off-by: Alex Brooks <albrooks@redhat.com>
|
2026-03-08 20:30:49 -07:00 |
|
Shaun Kotek
|
90512b2e8b
|
fix: Use iterator as not to store all the file loads in memory at once (#36149)
Signed-off-by: Shaun Kotek - Nvidia <skotek@nvidia.com>
|
2026-03-08 20:25:21 -07:00 |
|
danisereb
|
0a6a3a1290
|
Add support for ModelOpt MXFP8 MoE models (#35986)
Signed-off-by: Daniel Serebrenik <daserebrenik@nvidia.com>
|
2026-03-08 13:00:05 -07:00 |
|
nvnbagrov
|
b7332b058c
|
[Model] Nano Nemotron VL - fast media preprocessing (#35657)
Signed-off-by: Natan Bagrov <nbagrov@nvidia.com>
|
2026-03-08 03:04:05 -07:00 |
|
Wei Zhao
|
379689d533
|
[Perf] Support FP8 KV cache for Flashinfer MLA Sparse (#35891)
|
2026-03-07 13:51:54 -08:00 |
|
rahul-sarvam
|
85f50eb41f
|
Adding support to Sarvam's MoE models (#33942)
Signed-off-by: rahul-sarvam <140298821+rahul-sarvam@users.noreply.github.com>
|
2026-03-08 01:16:24 +08:00 |
|
vllmellm
|
ee8a29511f
|
[Bugfix] Fix compressed-tensors quantization failure for DeepSeek-R1 on MI300x (#36247)
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
|
2026-03-07 09:26:59 +00:00 |
|
Itay Alroy
|
24a03915f5
|
mla: don't update kv cache on dummy forwards (#36282)
Signed-off-by: Itay Alroy <ialroy@nvidia.com>
|
2026-03-07 00:36:00 +00:00 |
|
eellison
|
f3c6c9c9d7
|
[CustomOp] CustomOp FusedRMSNormGated (#35877)
Signed-off-by: Elias Ellison <elias.ellison@gmail.com>
Signed-off-by: eellison <elias.ellison@gmail.com>
|
2026-03-06 10:53:37 -08:00 |
|
Isotr0py
|
1d0c0d209c
|
[Misc] Lazy import registered processors (#36024)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Roger Wang <hey@rogerw.io>
|
2026-03-06 06:06:45 -08:00 |
|
Andreas Karatzas
|
2a00d3241f
|
[CI][MM] Gate vision encoder attention mask to MiniCPM only, fixing Aria regression (#36206)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-06 01:17:08 -08:00 |
|
Russell Bryant
|
00bd08edee
|
[Security] Respect user trust_remote_code setting in NemotronVL and KimiK25 (#36192)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2026-03-05 22:15:19 -08:00 |
|
Yanhong Li
|
a911f4dd20
|
[Model] Add support for OLMo Hybrid (#32550)
|
2026-03-05 14:51:06 -05:00 |
|
Xin Yang
|
f917020983
|
[Perf] Optimize FusedMoEModularKernel output tensor using torch.empty (#35794)
Signed-off-by: Xin Yang <xyangx@amazon.com>
|
2026-03-05 13:47:53 -05:00 |
|
tomeras91
|
86483ca774
|
[Bugfix] Disable FlashInfer TRTLLM BF16 path for non-gated MoE (#36146)
Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>
|
2026-03-05 09:49:05 -08:00 |
|
Netanel Haber
|
b93a9e6f6d
|
ParakeetProjection.norm = RMSNorm instead of nn.LayerNorm (#36133)
Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>
|
2026-03-05 17:29:30 +00:00 |
|
Avery Miao
|
e998fa76b9
|
[BUGFIX]Fix Qwen-Omni models audio max_token_per_item estimation error leading to encoder_cache_size is 0 (#35994)
Signed-off-by: Miao, Avery <avery.miao@intel.com>
|
2026-03-05 09:16:29 -08:00 |
|
Jiayi Yan
|
6a895197fa
|
[Bugfix][CI] fix typos (#34934)
Signed-off-by: 1195343015 <1195343015@qq.com>
Signed-off-by: Jiayi Yan <66017932+1195343015@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-03-05 17:05:46 +00:00 |
|
AllenDou
|
3ee68590c7
|
refactor funasr model. (#36108)
Signed-off-by: zixiao <shunli.dsl@alibaba-inc.com>
Co-authored-by: zixiao <shunli.dsl@alibaba-inc.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2026-03-05 08:07:37 -08:00 |
|
Cyrus Leung
|
7196348157
|
[Bugfix] Fix Qwen-VL tokenizer implementation (#36140)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-03-05 08:07:19 -08:00 |
|
Harry Mellor
|
ecde7af9c4
|
Fix import that was moved in Transformers 5.2.0 (#36120)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-03-05 13:59:44 +00:00 |
|