zofia
|
b482f71e9f
|
[XPU][7/N] enable xpu fp8 moe (#34202)
Signed-off-by: Zhu, Zufang <zufang.zhu@intel.com>
|
2026-02-11 03:33:59 +00:00 |
|
Tyler Michael Smith
|
066c6da6a0
|
[WideEP] Fix nvfp4 DeepEP High Throughput All2All backend (#33738)
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
|
2026-02-10 19:15:43 -08:00 |
|
bnellnm
|
d1481ba783
|
[MoE Refactor] Introduce MoERunner abstraction and move execution logic from FusedMoE to DefaultMoERunner (#32344)
Signed-off-by: Bill Nell <bnell@redhat.com>
|
2026-02-10 19:51:07 -05:00 |
|
Pavani Majety
|
578977bb5e
|
[SM100] Resubmit FMHA FP8 prefill for MLA (#31195)
Signed-off-by: Pavani Majety <pmajety@nvidia.com>
|
2026-02-10 16:18:43 -05:00 |
|
Roberto L. Castro
|
afdce12c89
|
[Perf][Kernel] Add faster topKperRow decode kernel for DeepSeek-V3.2 sparse attention (#33680)
Signed-off-by: LopezCastroRoberto <rocastro@redhat.com>
Signed-off-by: Roberto L. Castro <38211239+LopezCastroRoberto@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
|
2026-02-10 10:29:52 -05:00 |
|
xuebwang-amd
|
b129136c7a
|
[ROCm][Quantization] GPT_OSS in amd-quark format model loading and emulations (#29008)
Signed-off-by: xuebwang-amd <xuebwang@amd.com>
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
|
2026-02-10 10:08:05 -05:00 |
|
Gregory Shtrasberg
|
c60f8e3b49
|
[Bugfix][ROCm][GPT-OSS] Use old triton_kernels implementation on ROCm if the new API is not available (#34153)
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
|
2026-02-09 17:38:54 -06:00 |
|
TomerBN-Nvidia
|
995bbf38f1
|
[Bugfix] Fix shared expert input for latent MoE in EP+DP (Nemotron-H) (#34087)
Signed-off-by: Tomer Natan <tbarnatan@nvidia.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
|
2026-02-09 16:44:18 +00:00 |
|
JJJYmmm
|
9562912cea
|
[MODEL] Adding Support for Qwen3.5 Models (#34110)
Signed-off-by: JJJYmmm <1650675829@qq.com>
Signed-off-by: JJJYmmm <92386084+JJJYmmm@users.noreply.github.com>
Signed-off-by: Roger Wang <hey@rogerw.io>
Co-authored-by: wulipc <wulipc@users.noreply.github.com>
Co-authored-by: ywang96 <ywang96@users.noreply.github.com>
Co-authored-by: Isotr0py <Isotr0py@users.noreply.github.com>
Co-authored-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
|
2026-02-09 21:12:58 +08:00 |
|
zofia
|
9bdb06b436
|
[XPU][6/N] add xpu scaled_mm kernel (#34117)
Signed-off-by: Zhu, Zufang <zufang.zhu@intel.com>
|
2026-02-09 20:17:35 +08:00 |
|
Andrey Talman
|
f97ca67176
|
[Release 2.10] Update to Torch 2.10 - final release (#30525)
|
2026-02-08 13:51:09 -08:00 |
|
danisereb
|
084aa19f02
|
Add support for ModelOpt MXFP8 dense models (#33786)
Signed-off-by: Daniel Serebrenik <daserebrenik@nvidia.com>
|
2026-02-08 11:16:48 -08:00 |
|
navmarri14
|
1ecfabe525
|
glm 4.6 fused tuned inference config for B200 (#32958)
|
2026-02-08 18:55:47 +00:00 |
|
TomerBN-Nvidia
|
a263aa6140
|
[BugFix] Change support no act and mul for marlin (#34088)
Signed-off-by: Tomer Natan <tbarnatan@computelab-frontend-8.nvidia.com>
Co-authored-by: Tomer Natan <tbarnatan@computelab-frontend-8.nvidia.com>
|
2026-02-08 17:18:22 +00:00 |
|
Hashem Hashemi
|
ed17f54c8b
|
Perf tuning and expansion of cases covered for wvSplitKrc (#33493)
Signed-off-by: Hashem Hashemi <hashem.hashemi@amd.com>
|
2026-02-07 05:33:11 -08:00 |
|
Rohan Potdar
|
de3869bb4d
|
move checks out of unified_kv_cache_update custom op (#33943)
Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>
|
2026-02-07 05:30:09 -08:00 |
|
whx
|
ce9b3cd3e9
|
[PluggableLayer][3/N] Apply PluggableLayer to mamba layers. (#33660)
Signed-off-by: whx-sjtu <2952154980@qq.com>
|
2026-02-07 05:26:05 -08:00 |
|
lukec
|
15a0b9e570
|
Fix spelling errors (#33978)
|
2026-02-06 23:58:50 -08:00 |
|
Dimitrios Bariamis
|
207c3a0c20
|
Fix RoutingMethodType logic (#33919)
Signed-off-by: Dimitrios Bariamis <12195802+dbari@users.noreply.github.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: Dimitrios Bariamis <12195802+dbari@users.noreply.github.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
|
2026-02-06 14:03:34 -08:00 |
|
xuebwang-amd
|
9e9acce577
|
[Bugfix] Fix no attribute error of SharedFusedMoE (DeepSeek-V3.1 as test model) (#33993)
Signed-off-by: xuebwang-amd <xuebwang@amd.com>
|
2026-02-06 19:11:32 +00:00 |
|
Charlie Fu
|
fe5438200b
|
[Rocm][Bugfix] Fix dtype not same for gemm_a4w4 op (#33734)
Signed-off-by: charlifu <charlifu@amd.com>
|
2026-02-06 19:09:59 +00:00 |
|
Wentao Ye
|
77c09e1130
|
[Refactor] Remove align block size logic in moe_permute (#33449)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2026-02-06 10:57:06 -08:00 |
|
zofia
|
2ce9fe4ad0
|
[XPU][5/N] add wna16 xpu kernel (#33973)
Signed-off-by: Zhu, Zufang <zufang.zhu@intel.com>
|
2026-02-06 15:59:53 +00:00 |
|
Fadi Arafeh
|
f79d9dce16
|
[CPU][BugFix] Fix loading of w8a8int models with bias (#33582)
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com>
|
2026-02-06 11:59:20 +00:00 |
|
Kunshang Ji
|
7439e4f41b
|
[XPU][4/N] add mxfp4 moe model support (#33679)
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
|
2026-02-06 13:03:59 +08:00 |
|
Rabi Mishra
|
20d7454c9b
|
fix(ROCm): Make flash_attn import optional in MLA attention (#33511)
Signed-off-by: rabi <ramishra@redhat.com>
|
2026-02-06 02:22:53 +00:00 |
|
Xin Yang
|
79028d4388
|
[Perf] Disable clean_logits in deepgemm fp8_mqa_logits kernel (#33568)
|
2026-02-05 20:34:00 -05:00 |
|
Hashem Hashemi
|
d5c4800112
|
Adds padding and perf improvements to wvSplitK_fp8 (#33527)
Signed-off-by: Hashem Hashemi <hashem.hashemi@amd.com>
|
2026-02-05 22:16:02 +00:00 |
|
Matthew Bonanni
|
4145e50d85
|
[Bugfix] Fix DSV3.2 NVFP4 (#33932)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
|
2026-02-05 19:22:19 +00:00 |
|
bnellnm
|
a57c8228ff
|
[Moe Refactor] Make Inplace Flag for FusedMoEModularKernel part of the constructor (#33375)
Signed-off-by: Bill Nell <bnell@redhat.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
|
2026-02-05 18:07:18 +00:00 |
|
Aaron Hao
|
c1858b7ec8
|
[Feat][RL][1/2] Native Weight Syncing API: NCCL (#31943)
Signed-off-by: ahao-anyscale <ahao@anyscale.com>
Signed-off-by: Aaron Hao <ahao@anyscale.com>
Co-authored-by: SumanthRH <sumanthrh99@gmail.com>
|
2026-02-05 12:13:23 -05:00 |
|
wang.yuqi
|
1c3a221d3b
|
[Bugfix] Fix corner case of sparse embedding (#33886)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
|
2026-02-05 02:51:22 -08:00 |
|
jiahanc
|
59a5cb387a
|
[perf] Integrate flashinfer concat_mla_k (#31171)
|
2026-02-05 05:23:11 -05:00 |
|
Andreas Karatzas
|
3e472e81f9
|
[ROCm][Bugfix][CI] Fix hybrid models and their tests (Mamba/Jamba/Bamba) (#32710)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
Signed-off-by: Matthew Wong <Matthew.Wong2@amd.com>
Co-authored-by: Matthew Wong <Matthew.Wong2@amd.com>
|
2026-02-05 10:01:23 +00:00 |
|
Fadi Arafeh
|
fd03538bf9
|
[CPU][BugFix] Allow w8a8 oneDNN quantized matmul to support 3D inputs (#33727)
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com>
|
2026-02-05 06:26:09 +00:00 |
|
Chauncey
|
a7be77beef
|
[Bugfix] fix DeepSeek R1 with CUTLASS MLA Broken on B200 (#33637)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2026-02-05 01:28:36 +00:00 |
|
Simon Danielsson
|
4292c90a2a
|
[Bugfix] Support RotaryEmbedding CustomOp for gpt-oss (#33800)
Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com>
|
2026-02-04 20:17:41 +00:00 |
|
Taeksang Kim
|
6e98f6d8b6
|
Implement zero-copy GQA for multimodal and CPU (#33732)
Signed-off-by: Taeksang Kim <ts.kim@hyperaccel.ai>
|
2026-02-04 20:11:39 +00:00 |
|
Vadim Gimpelson
|
824058076c
|
[PERF] Change GDN Attention State Layout from [N, HV, K, V] to [N, HV, V, K] (#33291)
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>
|
2026-02-04 11:20:52 +00:00 |
|
Kunshang Ji
|
f79f777803
|
[XPU][2/N] add support unquantized moe support for xpu (#33659)
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
|
2026-02-04 02:12:25 -08:00 |
|
Frank Wang
|
45f8fd6f97
|
[Feature] Enable TRITON_ATTN for Batch Invariance (#33688)
Signed-off-by: frankwang28 <frank.wbb@hotmail.com>
|
2026-02-04 13:27:34 +08:00 |
|
Matthew Bonanni
|
bd8da29a66
|
[Bugfix] Fix sparse MLA metadata building (#33579)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
|
2026-02-03 15:29:48 -08:00 |
|
Michael Goin
|
2a99c5a6c8
|
[Bugfix] Disable TRTLLM FP8 MoE if router_logits_dtype==float32 and routing_method!=DeepSeekV3 (#33613)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2026-02-03 13:26:51 -08:00 |
|
Vadim Gimpelson
|
a372f3f40a
|
[MISC] Fix Tensor Parallelism for Quantized Mamba Models with n_groups=1 (#33257)
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>
|
2026-02-03 15:10:31 -05:00 |
|
zxy
|
a3acfa1071
|
[Models] Intern-S1-Pro (#33636)
Signed-off-by: zxy <zhou0493@e.ntu.edu.sg>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2026-02-03 05:49:45 -08:00 |
|
Michael Goin
|
e346e2d056
|
[Bugfix] Disable RoutingMethodType.[Renormalize,RenormalizeNaive] TRTLLM per-tensor FP8 MoE (#33620)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2026-02-03 10:37:15 +00:00 |
|
Kunshang Ji
|
e10604480b
|
[XPU][1/N] Deprecate ipex and switch to vllm-xpu-kernels for xpu platform (#33379)
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
|
2026-02-02 22:46:10 -08:00 |
|
Vasiliy Kuznetsov
|
0130223bd9
|
fix memory for online fp8 quantization with streaming weight load (#31914)
Signed-off-by: vasiliy <vasiliy@fb.com>
|
2026-02-02 14:17:42 -05:00 |
|
danielafrimi
|
0aca8b8c62
|
[MoE] Enable Shared/Routed Overlap For Latent MoE (Nemotron-H) (#32790)
Signed-off-by: dafrimi <dafrimi@nvidia.com>
|
2026-02-02 09:18:50 -05:00 |
|
csy0225
|
c3b40dc3e7
|
[Models] Step-3.5-Flash (#33523)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: i-zhangmingming <i-zhangmingming@stepfun.com>
Co-authored-by: xiewuxun <xiewuxun@stepfun.com>
Co-authored-by: zetaohong <i-hongzetao@stepfun.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
|
2026-02-02 10:21:18 +08:00 |
|