Commit Graph

4316 Commits

Author SHA1 Message Date
Jhao-Ting Chen
5573894737 Kimi k2.5 MLA based eagle3 (#36361)
Signed-off-by: Izzy Putterman <iputterman@nvidia.com>
Signed-off-by: Jhao-Ting Chen <jhaotingc@nvidia.com>
Co-authored-by: Izzy Putterman <iputterman@nvidia.com>
2026-03-11 11:36:11 -04:00
Martin Hickey
700a1ddc65 [Misc] Use envs module to get VLLM_DISABLED_KERNELS (#35776)
Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com>
2026-03-11 13:37:46 +00:00
Wuxun Zhang
e584dce52b Add XPU MLA Sparse backend for DeepSeek v3.2 (#33230)
Signed-off-by: Zhang, Wuxun <wuxun.zhang@intel.com>
2026-03-11 19:19:15 +08:00
Weiguang Li
724759684c [Bugfix] Fix Qwen3-VL timestamp mismatch when using num_frames without fps (#36136)
Signed-off-by: OiPunk <codingpunk@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-11 03:13:06 -07:00
Rahul Tuli
9d07a3d6e4 Add: Eagle3 support for Qwen3.5 (#36658)
Signed-off-by: Rahul-Tuli <rtuli@redhat.com>
2026-03-11 03:07:42 -07:00
Cyrus Leung
646b85544b [Refactor] Remove Molmo2 processor wrapper (#36667)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2026-03-11 03:07:20 -07:00
tc-mb
4286cc5ec2 fix(minicpmv): fix audio inference by handling meta device in init_re… (#36751)
Signed-off-by: caitianchi <caitianchi@modelbest.cn>
2026-03-11 03:06:28 -07:00
LoganJane
545d18d81b [Bugfix] Support other quantization methods in glm41v (#36321)
Signed-off-by: g00887675/loganJane <g00887675/loganJane73@hotmail.com>
Co-authored-by: g00887675/loganJane <g00887675/loganJane73@hotmail.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2026-03-11 09:48:05 +00:00
Harry Mellor
f4ae58b38b Remove unused config field from Gemma2 (#36672)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2026-03-11 01:51:19 -07:00
Kunshang Ji
76c6e6da08 [XPU] Support block fp8 moe by fallback to TritonExpert on XPU (#36458)
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
2026-03-10 21:54:09 -07:00
Hongbin Guo
4bf533623b [Doc] Fix duplicate words in comments (#36713)
Signed-off-by: Hongbin10 <jdmjdm1998@163.com>
2026-03-10 21:28:31 -07:00
tunglinwood
42fadebecb [Model] Add support for moonshotai/Kimi-Audio-7B-Instruct (#36127)
Signed-off-by: tunglinwood <tunglinwood@gmail.com>
Signed-off-by: tunglinwood <tomwu.tunglin@gmail.com>
Signed-off-by: tunglinwood <113751333+tunglinwood@users.noreply.github.com>
2026-03-10 21:24:48 -07:00
tianshu-Michael-yu
a197eda9c3 Add tuned H100 MoE configs for LFM2 8B and 24B (#36699) 2026-03-10 21:22:02 -07:00
Augusto Yao
b386bb3d7c fix bugs when token_classify & classify run concurrently (#36614)
Signed-off-by: augusto.yjh <augusto.yjh@antgroup.com>
2026-03-10 20:16:34 -07:00
Wei Zhao
84e436ed1c [Bug] Fix TRTLLM Block FP8 MoE Monolithic (#36296)
Signed-off-by: wzhao18 <wzhao18.sz@gmail.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
2026-03-10 22:04:47 -04:00
Hashem Hashemi
721ae79f50 Improvements to wvSplitKrc skinny GEMM solution (#34304)
Signed-off-by: Hashem Hashemi <hashem.hashemi@amd.com>
2026-03-10 09:14:27 -07:00
AllenDou
aefc59f088 FunASR model bugfix (#36633)
Signed-off-by: zixiao <shunli.dsl@alibaba-inc.com>
Co-authored-by: zixiao <shunli.dsl@alibaba-inc.com>
2026-03-10 08:14:21 -07:00
wang.yuqi
a3189a08b0 [Model] Consolidate score logic by introduce score_type (#36479)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
2026-03-10 13:32:25 +00:00
amirkl94
156e33553c Fix: Re-Enable EP for trtllm MoE FP8 backend (#36494)
Signed-off-by: Amir Klein <203507526+amirkl94@users.noreply.github.com>
2026-03-09 23:11:27 -07:00
Hojin Yang
0836be3b03 [Model] Add HyperCLOVAX-SEED-Think-32B vision-language model support (#31471)
Signed-off-by: effortprogrammer <yhjhoward7@gmail.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2026-03-10 10:59:19 +08:00
Ajay Anubolu
4e95ec111c [Bugfix] Fix Qwen3-Next in_proj_ba weight sharding with TP > 1 (#36242)
Signed-off-by: AjAnubolu <anuboluajay@gmail.com>
2026-03-09 19:16:26 -07:00
Lucas Kabela
3fd03f1ec2 [BE] Rename should_torch_compile_mm_vit to should_torch_compile_mm_encoder (#36281)
Signed-off-by: Lucas Kabela <lucaskabela@meta.com>
2026-03-09 18:22:05 +00:00
Shaun Kotek
fa028207aa Fix/resupport nongated fused moe triton (#36412)
Signed-off-by: Shaun Kotek - Nvidia <skotek@nvidia.com>
Signed-off-by: Natan Bagrov <nbagrov@nvidia.com>
Signed-off-by: Daniel Serebrenik <daserebrenik@nvidia.com>
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Signed-off-by: liweiguang <codingpunk@gmail.com>
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Signed-off-by: wang.yuqi <noooop@126.com>
Signed-off-by: Alex Brooks <albrooks@redhat.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: cong-or <conchubhar.gannon@gmail.com>
Signed-off-by: Tushar Shetty <tushar.shetty@abbyy.com>
Signed-off-by: Tushar Shetty <54362365+tusharshetty61@users.noreply.github.com>
Signed-off-by: jiang1.li <jiang1.li@intel.com>
Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com>
Signed-off-by: Xin Yang <xyangx@amazon.com>
Signed-off-by: Kevin H. Luu <khluu000@gmail.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: nvnbagrov <nbagrov@nvidia.com>
Co-authored-by: Sage <80211083+sagearc@users.noreply.github.com>
Co-authored-by: danisereb <daserebrenik@nvidia.com>
Co-authored-by: Jiangyun Zhu <riverclouds.zhu@qq.com>
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Weiguang Li <codingpunk@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Li, Jiang <jiang1.li@intel.com>
Co-authored-by: wang.yuqi <yuqi.wang@daocloud.io>
Co-authored-by: Alex Brooks <albrooks@redhat.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
Co-authored-by: cong-or <conchubhar.gannon@gmail.com>
Co-authored-by: Tushar Shetty <54362365+tusharshetty61@users.noreply.github.com>
Co-authored-by: liuzhenwei <zhenwei.liu@intel.com>
Co-authored-by: Xin Yang <105740670+xyang16@users.noreply.github.com>
Co-authored-by: Kevin H. Luu <khluu000@gmail.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2026-03-09 11:01:18 -07:00
SoluMilken
55d27cca55 [Misc] fix typo: dependant -> dependent (2 lines change) (#36511)
Signed-off-by: SoluMilken <ypiheyn.imm02g@g2.nctu.edu.tw>
2026-03-09 10:00:12 -07:00
Taoyu Zhu
70485a11bd [ROCM] Optimize the fused_topk_bias to use aiter instead of fallback torch ops. (#36253)
Signed-off-by: zhutaoyu <zhutaoyu97@gmail.com>
2026-03-09 11:30:35 -05:00
Matthew Bonanni
77a73458e3 Reapply [Attention] Refactor check_and_update_config (#35122)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
2026-03-09 07:17:14 -07:00
Tianyu Guo
5578f2a4d3 Support online use_audio_in_video (#36319)
Signed-off-by: Tianyu Guo <guoty9@mail2.sysu.edu.cn>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2026-03-09 07:16:44 -07:00
Xin Yang
dc6b578466 [Kernel] Add fused_sigmoid_gating_delta_rule_update kernel for Qwen3 Next (#35777)
Signed-off-by: Xin Yang <xyangx@amazon.com>
2026-03-08 23:41:01 -07:00
Cyrus Leung
d62856b928 [Misc] Move processors to transformers_utils (#35953)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2026-03-09 11:31:39 +08:00
Alex Brooks
bd2659a566 Increase Flexibility for OOV Multimodal Token Handling (#34858)
Signed-off-by: Alex Brooks <albrooks@redhat.com>
2026-03-08 20:30:49 -07:00
Shaun Kotek
90512b2e8b fix: Use iterator as not to store all the file loads in memory at once (#36149)
Signed-off-by: Shaun Kotek - Nvidia <skotek@nvidia.com>
2026-03-08 20:25:21 -07:00
danisereb
0a6a3a1290 Add support for ModelOpt MXFP8 MoE models (#35986)
Signed-off-by: Daniel Serebrenik <daserebrenik@nvidia.com>
2026-03-08 13:00:05 -07:00
nvnbagrov
b7332b058c [Model] Nano Nemotron VL - fast media preprocessing (#35657)
Signed-off-by: Natan Bagrov <nbagrov@nvidia.com>
2026-03-08 03:04:05 -07:00
Wei Zhao
379689d533 [Perf] Support FP8 KV cache for Flashinfer MLA Sparse (#35891) 2026-03-07 13:51:54 -08:00
rahul-sarvam
85f50eb41f Adding support to Sarvam's MoE models (#33942)
Signed-off-by: rahul-sarvam <140298821+rahul-sarvam@users.noreply.github.com>
2026-03-08 01:16:24 +08:00
vllmellm
ee8a29511f [Bugfix] Fix compressed-tensors quantization failure for DeepSeek-R1 on MI300x (#36247)
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
2026-03-07 09:26:59 +00:00
Itay Alroy
24a03915f5 mla: don't update kv cache on dummy forwards (#36282)
Signed-off-by: Itay Alroy <ialroy@nvidia.com>
2026-03-07 00:36:00 +00:00
eellison
f3c6c9c9d7 [CustomOp] CustomOp FusedRMSNormGated (#35877)
Signed-off-by: Elias Ellison <elias.ellison@gmail.com>
Signed-off-by: eellison <elias.ellison@gmail.com>
2026-03-06 10:53:37 -08:00
Isotr0py
1d0c0d209c [Misc] Lazy import registered processors (#36024)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Roger Wang <hey@rogerw.io>
2026-03-06 06:06:45 -08:00
Andreas Karatzas
2a00d3241f [CI][MM] Gate vision encoder attention mask to MiniCPM only, fixing Aria regression (#36206)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-03-06 01:17:08 -08:00
Russell Bryant
00bd08edee [Security] Respect user trust_remote_code setting in NemotronVL and KimiK25 (#36192)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
2026-03-05 22:15:19 -08:00
Yanhong Li
a911f4dd20 [Model] Add support for OLMo Hybrid (#32550) 2026-03-05 14:51:06 -05:00
Xin Yang
f917020983 [Perf] Optimize FusedMoEModularKernel output tensor using torch.empty (#35794)
Signed-off-by: Xin Yang <xyangx@amazon.com>
2026-03-05 13:47:53 -05:00
tomeras91
86483ca774 [Bugfix] Disable FlashInfer TRTLLM BF16 path for non-gated MoE (#36146)
Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>
2026-03-05 09:49:05 -08:00
Netanel Haber
b93a9e6f6d ParakeetProjection.norm = RMSNorm instead of nn.LayerNorm (#36133)
Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>
2026-03-05 17:29:30 +00:00
Avery Miao
e998fa76b9 [BUGFIX]Fix Qwen-Omni models audio max_token_per_item estimation error leading to encoder_cache_size is 0 (#35994)
Signed-off-by: Miao, Avery <avery.miao@intel.com>
2026-03-05 09:16:29 -08:00
Jiayi Yan
6a895197fa [Bugfix][CI] fix typos (#34934)
Signed-off-by: 1195343015 <1195343015@qq.com>
Signed-off-by: Jiayi Yan <66017932+1195343015@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2026-03-05 17:05:46 +00:00
AllenDou
3ee68590c7 refactor funasr model. (#36108)
Signed-off-by: zixiao <shunli.dsl@alibaba-inc.com>
Co-authored-by: zixiao <shunli.dsl@alibaba-inc.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2026-03-05 08:07:37 -08:00
Cyrus Leung
7196348157 [Bugfix] Fix Qwen-VL tokenizer implementation (#36140)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2026-03-05 08:07:19 -08:00
Harry Mellor
ecde7af9c4 Fix import that was moved in Transformers 5.2.0 (#36120)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2026-03-05 13:59:44 +00:00