Yueqian Lin
f8516a1ab9
[Bugfix][Model] Fix audio-in-video support for Qwen2.5-Omni and Qwen3-Omni ( #33605 )
...
Signed-off-by: linyueqian <linyueqian@outlook.com >
Signed-off-by: Roger Wang <hey@rogerw.io >
Co-authored-by: Roger Wang <hey@rogerw.io >
2026-02-04 12:15:29 +00:00
Vadim Gimpelson
824058076c
[PERF] Change GDN Attention State Layout from [N, HV, K, V] to [N, HV, V, K] ( #33291 )
...
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com >
2026-02-04 11:20:52 +00:00
Kunshang Ji
f79f777803
[XPU][2/N] add support unquantized moe support for xpu ( #33659 )
...
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com >
2026-02-04 02:12:25 -08:00
Frank Wang
45f8fd6f97
[Feature] Enable TRITON_ATTN for Batch Invariance ( #33688 )
...
Signed-off-by: frankwang28 <frank.wbb@hotmail.com >
2026-02-04 13:27:34 +08:00
Michael Goin
eb5ed20743
[Bugfix] Define router_logits_dtype for remaining MoE models ( #33737 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2026-02-04 13:24:14 +08:00
Shanshan Shen
9fb27dd3b3
[MM] Align the prefix of MMEncoderAttention with Attention ( #33750 )
...
Signed-off-by: shen-shanshan <467638484@qq.com >
2026-02-04 04:07:30 +00:00
Matthew Bonanni
bd8da29a66
[Bugfix] Fix sparse MLA metadata building ( #33579 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
2026-02-03 15:29:48 -08:00
Michael Goin
2a99c5a6c8
[Bugfix] Disable TRTLLM FP8 MoE if router_logits_dtype==float32 and routing_method!=DeepSeekV3 ( #33613 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2026-02-03 13:26:51 -08:00
Vadim Gimpelson
a372f3f40a
[MISC] Fix Tensor Parallelism for Quantized Mamba Models with n_groups=1 ( #33257 )
...
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com >
2026-02-03 15:10:31 -05:00
Patrick von Platen
f0d5251715
[Voxtral models] Skip warm-up to skip confusing error message in warm-up ( #33576 )
...
Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2026-02-03 07:22:34 -08:00
Shanshan Shen
5c4f2dd6ef
[MM] Pass prefix parameter to MMEncoderAttention ( #33674 )
...
Signed-off-by: shen-shanshan <467638484@qq.com >
2026-02-03 06:47:41 -08:00
Harry Mellor
2a8d84e66d
Fix Gemma3n audio encoder for Transformers v5 ( #33673 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-02-03 05:49:49 -08:00
zxy
a3acfa1071
[Models] Intern-S1-Pro ( #33636 )
...
Signed-off-by: zxy <zhou0493@e.ntu.edu.sg >
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2026-02-03 05:49:45 -08:00
Song Zhixin
ceab70c89d
[Bugfix] fix qwen3-asr response error ( #33644 )
...
Signed-off-by: jesse <szxfml@gmail.com >
Signed-off-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
2026-02-03 03:33:56 -08:00
Michael Goin
e346e2d056
[Bugfix] Disable RoutingMethodType.[Renormalize,RenormalizeNaive] TRTLLM per-tensor FP8 MoE ( #33620 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2026-02-03 10:37:15 +00:00
Kunshang Ji
e10604480b
[XPU][1/N] Deprecate ipex and switch to vllm-xpu-kernels for xpu platform ( #33379 )
...
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com >
2026-02-02 22:46:10 -08:00
Shengliang Xu
f1cb9b5544
Fix quantized Falcon-H1 model loading issues ( #32728 )
...
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2026-02-02 22:31:27 -08:00
Patrick von Platen
5019c59dd2
[Voxtral Realtime] Introduce global log mel max ( #33574 )
...
Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-02-02 17:01:47 -05:00
Vasiliy Kuznetsov
0130223bd9
fix memory for online fp8 quantization with streaming weight load ( #31914 )
...
Signed-off-by: vasiliy <vasiliy@fb.com >
2026-02-02 14:17:42 -05:00
Yang Liu
199e3cb476
[Model] Use mm_position to compute mrope positions for GLM-4.xV ( #33039 )
...
Signed-off-by: Yang <lymailforjob@gmail.com >
2026-02-02 16:55:48 +00:00
Isotr0py
4061dcf4c5
[Bugfix] Enable Kimi k25 processor test ( #33562 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2026-02-02 14:25:25 +00:00
danielafrimi
0aca8b8c62
[MoE] Enable Shared/Routed Overlap For Latent MoE (Nemotron-H) ( #32790 )
...
Signed-off-by: dafrimi <dafrimi@nvidia.com >
2026-02-02 09:18:50 -05:00
Cyrus Leung
b10d05b8a8
[Model] Use explicit types in get_generation_prompt ( #33551 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-02-02 12:38:49 +00:00
Borushiki
b398e5c819
Update get_expert_mapping to include self parameter ( #33525 )
...
Signed-off-by: Borushiki <38628261+Otsutsukii@users.noreply.github.com >
2026-02-02 20:29:07 +08:00
Grzegorz K. Karch
78061ef584
Fix accessing hidden_act from model config ( #32686 )
...
Signed-off-by: Grzegorz Karch <gkarch@nvidia.com >
2026-02-02 11:11:33 +00:00
RED
808dd87b30
[Model] Support DeepSeek-OCR-2 ( #33165 )
...
Signed-off-by: liuli <ll407707@alibaba-inc.com >
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Co-authored-by: liuli <ll407707@alibaba-inc.com >
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2026-02-02 06:24:10 +00:00
csy0225
c3b40dc3e7
[Models] Step-3.5-Flash ( #33523 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
Co-authored-by: i-zhangmingming <i-zhangmingming@stepfun.com >
Co-authored-by: xiewuxun <xiewuxun@stepfun.com >
Co-authored-by: zetaohong <i-hongzetao@stepfun.com >
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com >
2026-02-02 10:21:18 +08:00
will b.
46b4a02794
Fix DeepSeek V2 RoPE initialization error ( #33501 )
...
Signed-off-by: Eduardo Salinas <edus@microsoft.com >
Signed-off-by: catswe <212922539+catswe@users.noreply.github.com >
Co-authored-by: Eduardo Salinas <edus@microsoft.com >
2026-02-01 21:00:56 +00:00
shaharmor98
8869cd8ec1
Add MoE config for Super B200 TP2 ( #33510 )
2026-02-01 18:48:37 +00:00
JartX
cd86fff38f
[BUGFIX] Fix hipErrorIllegalState in Qwen3-Omni during startup profiling allow inference Omni on ROCM ( #33077 )
...
Signed-off-by: JartX <sagformas@epdcenter.es >
2026-02-01 13:36:25 +00:00
Maral
b5f8c3092d
[W8A8 Block Linear Refactor][1/N] Keep all quantization types into QuantFP8 class. ( #33047 )
...
Signed-off-by: maral <maralbahari.98@gmail.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2026-02-01 09:28:01 +00:00
Eduardo Salinas
302ecf64ff
[Models]: lfm2_siglip2 return intermediate encoder layers ( #33370 )
...
Signed-off-by: Eduardo Salinas <edus@microsoft.com >
2026-02-01 06:17:49 +00:00
René Honig
079781177a
fix: Add SM120 (RTX Blackwell) support for FlashInfer CUTLASS NVFP4 MoE kernels ( #33417 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
Co-authored-by: mgoin <mgoin64@gmail.com >
2026-01-31 14:06:42 -08:00
Roy Wang
63c0889416
[Misc] Fix flashinfer related tests ( #33462 )
...
Signed-off-by: esmeetu <jasonailu87@gmail.com >
2026-01-31 16:10:24 -05:00
Cyrus Leung
88c3e114d8
[Refactor] Move MM data parsing outside processor ( #33408 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-01-31 16:46:14 +00:00
Cyrus Leung
92924b2ddd
[Deprecation] Remove deprecated items related to pooling ( #33477 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-01-31 08:44:40 -08:00
ℍ𝕠𝕝𝕝𝕠𝕨 𝕄𝕒𝕟
13b842f271
[BugFix][Router Replay] Capture Logical Experts with EPLB ( #33013 )
...
Signed-off-by: Hollow Man <hollowman@opensuse.org >
2026-01-31 10:12:17 -05:00
Cyrus Leung
f0a1c8453a
[Frontend] Use new Renderer for Completions and Tokenize API ( #32863 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-01-31 04:51:15 -08:00
Jinwu
f68e3ea4e1
[BugFix] Add synchronize in CutlassW4A8LinearKernel to ensure data is ready for use. ( #33078 )
...
Co-authored-by: jinwuguo <jinwuguo@tencent.com >
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
2026-01-31 08:14:54 +00:00
Fadi Arafeh
1618e25492
[CPU][Feat] Enable KleidiAI accelerated int4 dynamic quant with BF16 activations on Arm CPUs ( #33122 )
...
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com >
2026-01-31 07:16:22 +00:00
AutumnAurelium
f3888aca83
Add EAGLE3 support for AFMoE ( #33111 )
...
Signed-off-by: AutumnAurelium <88015631+AutumnAurelium@users.noreply.github.com >
2026-01-31 06:53:08 +00:00
Dimitrios Bariamis
f0bca83ee4
Add support for Mistral Large 3 inference with Flashinfer MoE ( #33174 )
...
Signed-off-by: Dimitrios Bariamis <12195802+dbari@users.noreply.github.com >
Co-authored-by: Dimitrios Bariamis <12195802+dbari@users.noreply.github.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2026-01-30 22:48:27 -08:00
Matthias Gehre
73419abfae
[Bugfix] Handle Asym W4A16 (ConchLinearKernel) for CT ( #33200 )
...
Signed-off-by: Matthias Gehre <matthias.gehre@amd.com >
Co-authored-by: Cursor <cursoragent@cursor.com >
2026-01-31 06:21:51 +00:00
Nicolò Lucchesi
e77f162cf5
[Bugfix] Fix Qwen3ASR language asr tag in output ( #33410 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2026-01-31 05:24:49 +00:00
Patrick von Platen
15e0bb9c42
[Streaming -> Realtime] Rename all voxtral related classes, fn, files ( #33415 )
...
Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com >
2026-01-31 04:49:00 +00:00
Matthew Bonanni
aaa901ad55
[Attention] Move MLA forward from backend to layer ( #33284 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
2026-01-30 19:30:00 -08:00
Wentao Ye
010ec0c30e
[Deprecation] Deprecate seed_everything and scatter_mm_placeholders in v0.15 ( #33362 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2026-01-31 02:54:16 +00:00
Gregory Shtrasberg
31aedfe7d6
[Bugfix][ROCm] Fixing the skinny gemm dispatch logic from #32831 ( #33366 )
...
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com >
2026-01-30 19:05:23 -06:00
Michael Goin
67ebaff528
Refactor NVFP4 Linear utils for ModelOpt and CT ( #33201 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2026-01-30 16:37:42 -08:00
Pavani Majety
c3a9752b0c
[Hardware][SM100] Add TRTLLM Kernel for INT4 W4A16 Kernel. ( #32437 )
...
Signed-off-by: Pavani Majety <pmajety@nvidia.com >
2026-01-30 10:30:46 -08:00