Lucas Kabela
|
714c6e0eab
|
[torch.compile][BE] Modify cudagraph callable to check for is_forward_context_set (#36288)
Signed-off-by: Lucas Kabela <lucaskabela@meta.com>
|
2026-03-16 19:42:34 +00:00 |
|
Tianyu Guo
|
43a73f853b
|
Remove unused EVS functions in qwen3_vl.py (#37183)
Signed-off-by: Tianyu Guo <guoty9@mail2.sysu.edu.cn>
|
2026-03-16 13:09:09 +00:00 |
|
Lukas Geiger
|
f9e6db3034
|
[Models][Qwen3 ViT] Keep max_seqlen on CPU to prevent D2H sync (#37139)
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2026-03-16 12:11:59 +00:00 |
|
Harry Mellor
|
ad041c79db
|
Fix text only inputs for MRoPE models with the Transformers modelling backend (#37055)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-03-16 10:31:16 +00:00 |
|
Harry Mellor
|
122f75d939
|
Fix pipeline parallel with multimodal models with the Transformers modelling backend (#37057)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-03-16 10:20:37 +00:00 |
|
Vadim Gimpelson
|
8374387bd8
|
[FlashInfer] Revert block_size 16 + head_size 256 workaround on Blackwell (#36987)
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>
|
2026-03-16 09:04:29 +00:00 |
|
Isotr0py
|
912fbe9555
|
[Bugfix] Fix Qwen2.5-Omni/Qwen3-Omni use_audio_in_video with multi-video inputs (#37147)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2026-03-16 08:56:06 +00:00 |
|
bigshanedogg
|
2390d44209
|
[Model] Add HyperCLOVAX-SEED-Think-14B language model support (#37107)
Signed-off-by: bigshanedogg <bigshane319@gmail.com>
|
2026-03-16 06:40:05 +00:00 |
|
Jiangyun Zhu
|
697e4ff352
|
[GDN] add a config for gdn kernel selection (#36647)
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
|
2026-03-16 00:40:17 +08:00 |
|
Isotr0py
|
a8e8d62dd8
|
[Misc] Clean up Kimi-audio whisper encoder loading (#36903)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2026-03-14 23:37:52 +08:00 |
|
Harry Mellor
|
ffa5d74f15
|
Enable loading of fused expert weights in the Transformers modelling backend (#36997)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-03-14 07:01:06 +00:00 |
|
Dimitrios Bariamis
|
367cf5cd3e
|
[Feat][Bugfix] Enable additional dimension for Flashinfer MLA and fix routing dtype (#36931)
Signed-off-by: Dimitrios Bariamis <12195802+dbari@users.noreply.github.com>
Co-authored-by: Dimitrios Bariamis <12195802+dbari@users.noreply.github.com>
|
2026-03-13 16:41:16 -07:00 |
|
Benjamin Chislett
|
8b346309a5
|
[Refactor] Consolidate SupportsEagle (#36063)
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>
|
2026-03-13 23:22:40 +00:00 |
|
Harry Mellor
|
0005d2a3c9
|
Use Transformers v5 WeightRenaming for Transformers modeling backend (#31545)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-03-13 20:49:08 +00:00 |
|
Isotr0py
|
abf61aaa8e
|
[Bugfix] Fix Qwen2.5-omni/Qwen3-omni mm_processor cache for audio_in_video request (#36800)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2026-03-13 18:16:05 +00:00 |
|
bigmoyan
|
4508532fbd
|
[Bugfix] fix paddleocr crash on some image shape (#36959)
Signed-off-by: wangzhengtao <wangzhengtao@msh.team>
Signed-off-by: bigmoyan <moyan_work@foxmail.com>
Co-authored-by: wangzhengtao <wangzhengtao@msh.team>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2026-03-13 13:46:55 +00:00 |
|
Thomas Parnell
|
f296a1966d
|
[Bugfix] Fix FlashInfer GDN warmup ValueError on SM90 GPUs (#36876)
|
2026-03-13 07:09:39 +01:00 |
|
whyiug
|
1ce13cf992
|
[Model] Add support for BERT-like Chinese ERNIE pooling models (#36385)
Signed-off-by: whyiug <whyiug@hotmail.com>
Co-authored-by: wang.yuqi <yuqi.wang@daocloud.io>
|
2026-03-13 03:23:53 +00:00 |
|
Nikita
|
10f08dedfa
|
[Model] Add ColPali late interaction model for multi-modal retrieval (#36818)
Signed-off-by: Nikita Sukharev <kaonael@gmail.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2026-03-13 02:18:57 +00:00 |
|
Shubhra Pandit
|
87985077a4
|
[Speculative Decoding] Add norm_before_fc for gpt-oss draft models (#36545)
Signed-off-by: Shubhra Pandit <shubhra.pandit@gmail.com>
Co-authored-by: Benjamin Chislett <chislett.ben@gmail.com>
Co-authored-by: Benjamin Chislett <bchislett@nvidia.com>
|
2026-03-12 23:03:32 +00:00 |
|
caozuoba
|
9e19f8338b
|
[Perf] add packed recurrent fast path for decode (#36596)
Signed-off-by: hdj <1293066020@qq.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
|
2026-03-12 04:01:57 -07:00 |
|
Shanshan Shen
|
f0d3658c0f
|
[MM][OOT] Support CPU seq_lens for OOT MMEncoderAttention kernels (#36605)
Signed-off-by: shen-shanshan <467638484@qq.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2026-03-12 03:28:23 -07:00 |
|
Xu Jinyang
|
3e64fe4a18
|
[Bugfix] Warm up Triton autotuner for GDN layers during V1 profiling (#36599)
Signed-off-by: AuYang <459461160@qq.com>
|
2026-03-12 00:51:09 -07:00 |
|
István Ketykó
|
00726c74c9
|
[Bugfix][Model] Fix DeepSeek-OCR TensorSchema crash on empty images_crop (#36670)
Signed-off-by: István Ketykó <istvan.ketyko@gmail.com>
|
2026-03-12 15:35:54 +08:00 |
|
Harry Mellor
|
65986db6ba
|
Make Gemma and Gemma 2 accept inputs_embeds like Gemma 3 (#36787)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-03-11 18:12:43 +00:00 |
|
tianshu-Michael-yu
|
741f4e046b
|
fix: align lfm2 thumbnail token counting with HF (#36707)
|
2026-03-11 10:28:38 -07:00 |
|
Cyrus Leung
|
196802dfa6
|
[Misc] Clean up renderers (#36770)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-03-11 16:39:29 +00:00 |
|
Julien Denize
|
afebeffbfb
|
Add support to Mistral large 3 eagle with dense layers (#36163)
Signed-off-by: juliendenize <julien.denize@mistral.ai>
Signed-off-by: Julien Denize <40604584+juliendenize@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2026-03-11 15:42:56 +00:00 |
|
Jhao-Ting Chen
|
5573894737
|
Kimi k2.5 MLA based eagle3 (#36361)
Signed-off-by: Izzy Putterman <iputterman@nvidia.com>
Signed-off-by: Jhao-Ting Chen <jhaotingc@nvidia.com>
Co-authored-by: Izzy Putterman <iputterman@nvidia.com>
|
2026-03-11 11:36:11 -04:00 |
|
Weiguang Li
|
724759684c
|
[Bugfix] Fix Qwen3-VL timestamp mismatch when using num_frames without fps (#36136)
Signed-off-by: OiPunk <codingpunk@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
|
2026-03-11 03:13:06 -07:00 |
|
Rahul Tuli
|
9d07a3d6e4
|
Add: Eagle3 support for Qwen3.5 (#36658)
Signed-off-by: Rahul-Tuli <rtuli@redhat.com>
|
2026-03-11 03:07:42 -07:00 |
|
Cyrus Leung
|
646b85544b
|
[Refactor] Remove Molmo2 processor wrapper (#36667)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-03-11 03:07:20 -07:00 |
|
tc-mb
|
4286cc5ec2
|
fix(minicpmv): fix audio inference by handling meta device in init_re… (#36751)
Signed-off-by: caitianchi <caitianchi@modelbest.cn>
|
2026-03-11 03:06:28 -07:00 |
|
LoganJane
|
545d18d81b
|
[Bugfix] Support other quantization methods in glm41v (#36321)
Signed-off-by: g00887675/loganJane <g00887675/loganJane73@hotmail.com>
Co-authored-by: g00887675/loganJane <g00887675/loganJane73@hotmail.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2026-03-11 09:48:05 +00:00 |
|
Harry Mellor
|
f4ae58b38b
|
Remove unused config field from Gemma2 (#36672)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-03-11 01:51:19 -07:00 |
|
Hongbin Guo
|
4bf533623b
|
[Doc] Fix duplicate words in comments (#36713)
Signed-off-by: Hongbin10 <jdmjdm1998@163.com>
|
2026-03-10 21:28:31 -07:00 |
|
tunglinwood
|
42fadebecb
|
[Model] Add support for moonshotai/Kimi-Audio-7B-Instruct (#36127)
Signed-off-by: tunglinwood <tunglinwood@gmail.com>
Signed-off-by: tunglinwood <tomwu.tunglin@gmail.com>
Signed-off-by: tunglinwood <113751333+tunglinwood@users.noreply.github.com>
|
2026-03-10 21:24:48 -07:00 |
|
AllenDou
|
aefc59f088
|
FunASR model bugfix (#36633)
Signed-off-by: zixiao <shunli.dsl@alibaba-inc.com>
Co-authored-by: zixiao <shunli.dsl@alibaba-inc.com>
|
2026-03-10 08:14:21 -07:00 |
|
wang.yuqi
|
a3189a08b0
|
[Model] Consolidate score logic by introduce score_type (#36479)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
|
2026-03-10 13:32:25 +00:00 |
|
Hojin Yang
|
0836be3b03
|
[Model] Add HyperCLOVAX-SEED-Think-32B vision-language model support (#31471)
Signed-off-by: effortprogrammer <yhjhoward7@gmail.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2026-03-10 10:59:19 +08:00 |
|
Ajay Anubolu
|
4e95ec111c
|
[Bugfix] Fix Qwen3-Next in_proj_ba weight sharding with TP > 1 (#36242)
Signed-off-by: AjAnubolu <anuboluajay@gmail.com>
|
2026-03-09 19:16:26 -07:00 |
|
Lucas Kabela
|
3fd03f1ec2
|
[BE] Rename should_torch_compile_mm_vit to should_torch_compile_mm_encoder (#36281)
Signed-off-by: Lucas Kabela <lucaskabela@meta.com>
|
2026-03-09 18:22:05 +00:00 |
|
SoluMilken
|
55d27cca55
|
[Misc] fix typo: dependant -> dependent (2 lines change) (#36511)
Signed-off-by: SoluMilken <ypiheyn.imm02g@g2.nctu.edu.tw>
|
2026-03-09 10:00:12 -07:00 |
|
Matthew Bonanni
|
77a73458e3
|
Reapply [Attention] Refactor check_and_update_config (#35122)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
|
2026-03-09 07:17:14 -07:00 |
|
Tianyu Guo
|
5578f2a4d3
|
Support online use_audio_in_video (#36319)
Signed-off-by: Tianyu Guo <guoty9@mail2.sysu.edu.cn>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2026-03-09 07:16:44 -07:00 |
|
Xin Yang
|
dc6b578466
|
[Kernel] Add fused_sigmoid_gating_delta_rule_update kernel for Qwen3 Next (#35777)
Signed-off-by: Xin Yang <xyangx@amazon.com>
|
2026-03-08 23:41:01 -07:00 |
|
Cyrus Leung
|
d62856b928
|
[Misc] Move processors to transformers_utils (#35953)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-03-09 11:31:39 +08:00 |
|
Alex Brooks
|
bd2659a566
|
Increase Flexibility for OOV Multimodal Token Handling (#34858)
Signed-off-by: Alex Brooks <albrooks@redhat.com>
|
2026-03-08 20:30:49 -07:00 |
|
nvnbagrov
|
b7332b058c
|
[Model] Nano Nemotron VL - fast media preprocessing (#35657)
Signed-off-by: Natan Bagrov <nbagrov@nvidia.com>
|
2026-03-08 03:04:05 -07:00 |
|
Wei Zhao
|
379689d533
|
[Perf] Support FP8 KV cache for Flashinfer MLA Sparse (#35891)
|
2026-03-07 13:51:54 -08:00 |
|