Commit Graph

823 Commits

Author SHA1 Message Date
Yongye Zhu
e8ebbdde83 [Quantization] Add FlashInfer CuteDSL batched experts backend for NVFP4 MoE (#38251)
Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
2026-04-06 11:57:53 -07:00
bnellnm
f01482408c [MoE Refactor][Test] FusedMoE layer test (#24675)
Signed-off-by: Bill Nell <bnell@redhat.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
2026-04-06 17:17:23 +00:00
Micah Williamson
9570654c6d [ROCm][CI] Run Kernels Core Operation Test On MI325 and mitigate flakiness (#38184)
Signed-off-by: Micah Williamson <micah.williamson@amd.com>
2026-04-06 09:42:02 +08:00
shunting314
8b141ed8c3 full cudagraph for flex-attn (#36298)
Signed-off-by: shunting314 <shunting@meta.com>
2026-04-02 21:15:01 -07:00
Carl Y
3bc2734dd0 [Kernel] Fuse FP8 output quantization into merge_attn_states (#36518)
Signed-off-by: Carl You <4531192+carlyou@users.noreply.github.com>
2026-04-03 01:47:04 +00:00
Stefano Castagnetta
58262dec6e [Bugfix] Fix test mocks after SM100 restriction in #38730 (#38791)
Signed-off-by: Stefano Castagnetta <scastagnetta@nvidia.com>
Co-authored-by: Claude <noreply@anthropic.com>
2026-04-02 13:12:58 -04:00
Jiangyun Zhu
4eefbf9609 [Perf] fuse kernels in gdn (#37813)
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
2026-04-02 11:52:18 +00:00
Li, Jiang
c6f722b93e [CPU] Support gelu act in cpu_fused_moe (#38770)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
2026-04-02 14:14:32 +08:00
Xin Yang
9bd7231106 Revert "[Kernel] Add gpt-oss Router GEMM kernel (#37205)" (#38778)
Signed-off-by: Xin Yang <xyangx@amazon.com>
2026-04-01 22:02:32 -07:00
Monishver
c09ad767cd Feature/silu block quant fusion v1 (#32996)
Signed-off-by: Monishver Chandrasekaran <monishverchandrasekaran@gmail.com>
2026-04-01 18:50:43 +00:00
Michael Goin
db5d0719e1 [Kernel] Add MXFP8 to Marlin GEMM/MoE and refactor Mxfp8LinearOp (#34664)
Signed-off-by: mgoin <mgoin64@gmail.com>
2026-04-01 09:41:42 -07:00
bnellnm
7cf56a59a2 [MoE Refactor] Make SharedExperts class for use with DefaultMoERunner (#35153)
Signed-off-by: Bill Nell <bnell@redhat.com>
2026-04-01 09:44:08 -04:00
Zhanda Zhu
c75a313824 [Perf] triton bilinear_pos_embed kernel for ViT (#37948)
Signed-off-by: Zhanda Zhu <zhandazhu@gmail.com>
2026-04-01 01:52:02 -07:00
Li, Jiang
36d7f19897 [CPU] Support head_size 512 in cpu_attn (#38676)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
2026-04-01 05:42:27 +00:00
Luka Govedič
40bb175027 [vLLM IR] 1/N Implement IR skeleton and rms_norm op (#33825)
Signed-off-by: Luka Govedič <lgovedic@redhat.com>
Signed-off-by: Xinyu Chen <xinyu1.chen@intel.com>
Signed-off-by: chzhang <chaojun.zhang@intel.com>
Signed-off-by: Luka Govedic <luka.govedic@gmail.com>
Co-authored-by: Xinyu Chen <xinyu1.chen@intel.com>
Co-authored-by: Chaojun Zhang <chaojun.zhang@intel.com>
Co-authored-by: Luka Govedič <ProExpertProg@h100-01.nemg-001.lab.rdu2.dc.redhat.com>
2026-03-31 22:15:05 -04:00
Yanan Cao
cc671cb110 [Kernel] [Helion] [17/N] Add Helion kernel torch.compile support (#38592)
Signed-off-by: Yanan Cao <gmagogsfm@gmail.com>
Co-authored-by: Claude Sonnet 4 <noreply@anthropic.com>
2026-03-31 17:06:42 -04:00
yzong-rh
d9b90a07ac [MoE Refactor] Migrate Unquantized to Full Oracle Flow (#36286)
Signed-off-by: Yifan Zong <yzong@redhat.com>
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Signed-off-by: yzong-rh <yzong@redhat.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
2026-03-31 15:43:33 -04:00
Olya Kozlova
598190aac3 [fix] Remove trtllm ragged mla prefills (#36540)
Signed-off-by: Olya Kozlova <okozlova@nvidia.com>
2026-03-31 12:30:27 -07:00
SandishKumarHN
3896e021a0 [Bugfix] Fix FusedMoE weight loading with padded hidden dimensions (#37010)
Signed-off-by: SandishKumarHN <sandish@fb.com>
2026-03-31 12:22:26 -04:00
wliao2
4dfad17ed1 replace cuda_device_count_stateless() to current_platform.device_count() (#37841)
Signed-off-by: Liao, Wei <wei.liao@intel.com>
Signed-off-by: wliao2 <wei.liao@intel.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
2026-03-31 22:32:54 +08:00
Yintong Lu
f09daea261 [CPU] Support int8 compute mode in CPU AWQ (#35697)
Signed-off-by: Yintong Lu <yintong.lu@intel.com>
2026-03-31 15:27:37 +08:00
zhangyiming
1ac6694297 [OOT] Add OOT support for linear kernel. (#37989)
Signed-off-by: menogrey <1299267905@qq.com>
2026-03-31 14:33:21 +08:00
SandishKumarHN
bcc6f67447 [Bugfix] Use null block (0) for padded block table entries (#35431)
Signed-off-by: SandishKumarHN <sandish@fb.com>
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
Co-authored-by: Matthew Bonanni <mbonanni@redhat.com>
2026-03-30 14:02:51 -07:00
Ilya Markov
12701e8af2 [EPLB] Optmize eplb mapping and record in router for prefill (#36261)
Signed-off-by: ilmarkov <markovilya197@gmail.com>
2026-03-30 19:48:33 +00:00
Johnny
b4a2f3ac36 [NVIDIA] Bugfix NVFP4 DGX Spark and RTX50 (#38423)
Signed-off-by: johnnynunez <johnnynuca14@gmail.com>
Signed-off-by: Johnny <johnnynuca14@gmail.com>
2026-03-30 09:36:18 -07:00
roikoren755
8e6293e838 [Mamba] Add stochastic rounding support (#35753)
Signed-off-by: Roi Koren <roik@nvidia.com>
2026-03-30 12:33:49 -04:00
Andreas Karatzas
bdc1719eb9 [ROCm][CI] Fix AITER state leak in shared_fused_moe_routed_transform test (#38137)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-03-26 09:26:46 -07:00
Zhewen Li
be1a85b7a2 Revert "[MoE Kernel] Flashinfer nvfp4 cutedsl moe kernel integration" (#38050) (#38169)
Co-authored-by: Zhewen Li <zhewenli@inferact.ai>
2026-03-26 07:59:09 -07:00
Andreas Karatzas
7d6917bef5 [ROCm] Fix MoE kernel test failures on gfx950 (#37833)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
Signed-off-by: Matthew Wong <Matthew.Wong2@amd.com>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Co-authored-by: Matthew Wong <Matthew.Wong2@amd.com>
2026-03-25 13:46:40 -05:00
Yongye Zhu
678b3c99e8 [MoE Kernel] Flashinfer nvfp4 cutedsl moe kernel integration (#38050) 2026-03-25 10:16:40 -07:00
liangel-02
8c47fdfdb1 [FlexAttention] allow custom mask mod (#37692)
Signed-off-by: Angel Li <liangel@meta.com>
2026-03-24 16:03:24 -04:00
Ranran
dc6908ac6a [Bugfix] Register VLLM_BATCH_INVARIANT in envs.py to fix spurious unknown env var warning (#35007)
Signed-off-by: Ranran <1012869439@qq.com>
Signed-off-by: Ranran <hzz5361@psu.edu>
Signed-off-by: ran <hzz5361@psu.edu>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
2026-03-23 18:31:14 -04:00
Kyle Sayers
38364a7e32 [Sparse24] [Deprecation] Remove Sparse24 CT integration and kernels (#36799)
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
2026-03-23 16:03:29 -04:00
Robert Shaw
4383f1532e [MoE] Move PF Methods to Folder (#35927) 2026-03-22 02:42:59 -06:00
Robert Shaw
6b2fa3a762 [MoE] Move FlashInfer CuteDSL experts into fused_moe/experts/ (#37759)
Signed-off-by: Robert Shaw <robertgshaw2@gmail.com>
2026-03-21 19:15:16 -04:00
Andreas Karatzas
3ffa52009f [ROCm][CI] Guard CudaPlatform/RocmPlatform imports to fix test collection on cross-platform builds (#37617)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-03-21 11:58:58 +08:00
Yongye Zhu
87bd91892f [MoE Refactor] Mxfp4 oracle rebased (#37128)
Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-21 03:37:04 +00:00
Xin Yang
d0532bf38d [Perf] Eliminate redundant SparseMatrix creation in gpt_oss_triton_kernels (#37683)
Signed-off-by: Xin Yang <xyangx@amazon.com>
2026-03-20 11:28:41 -06:00
L.B.R.
1779c09898 [ROCm] Enable wvSplitK skinny GEMM kernel for RDNA4/gfx1x decode (#34709)
Signed-off-by: L.B.R. <lbr@mmonad.com>
Co-authored-by: L.B.R. <lbr@mmonad.com>
2026-03-20 10:11:23 -05:00
rasmith
98ff042917 [CI][BugFix][AMD] Don't set VLLM_ROCM_USE_AITER anymore in test_rocm_aiter_topk since its not necessary (#36996)
Signed-off-by: Randall Smith <Randall.Smith@amd.com>
2026-03-20 07:12:45 +08:00
Xin Yang
b1169d7be8 [Kernel] Add gpt-oss Router GEMM kernel (#37205)
Signed-off-by: Xin Yang <xyangx@amazon.com>
2026-03-18 08:15:56 -07:00
Andreas Karatzas
58cde5c026 [ROCm][CI] Skip trtllm kvfp8 dequant tests on ROCm (#37330)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-03-18 11:12:26 +08:00
Yanan Cao
ff9fbc9aff [Kernel][Helion] [16/N] Refactor register_kernel API to be more Dynamo-friendly (#36705)
Signed-off-by: Yanan Cao <gmagogsfm@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-18 01:23:35 +00:00
Andrey Talman
68f783a727 [Torch 2.11] Guard torch._C._cpu attribute checks for forward compatibility (#35673)
Signed-off-by: atalman <atalman@fb.com>
2026-03-17 18:47:59 +00:00
Vadim Gimpelson
6c1cfbad32 Support non-contiguous KV cache in TRTLLM fp8 dequant kernel (#36867)
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>
Signed-off-by: Vadim Gimpelson <156319763+vadiklyutiy@users.noreply.github.com>
Co-authored-by: Pavani Majety <pavanimajety@gmail.com>
2026-03-16 17:48:42 -07:00
Terry Gao
3e6a1e1686 [Custom Ops] Add functional + out variant for scaled_fp4_quant (#34389)
Signed-off-by: tianrengao <terrygao87@gmail.com>
2026-03-16 18:51:46 -04:00
Krish Gupta
c0f011918d [Bugfix] opcheck false mutation error in rms_norm_per_block_quant (#36688) (#36779)
Signed-off-by: Krish Gupta <krishom70@gmail.com>
2026-03-16 21:11:33 +00:00
leo-cf-tian
2754231ba3 [Kernel] Add FlashInfer MoE A2A Kernel (#36022)
Signed-off-by: wzhao18 <wzhao18.sz@gmail.com>
Signed-off-by: Leo Tian <lctian@nvidia.com>
Co-authored-by: wzhao18 <wzhao18.sz@gmail.com>
Co-authored-by: Stefano Castagnetta <scastagnetta@nvidia.com>
Co-authored-by: root <root@lyris0267.lyris.clusters.nvidia.com>
2026-03-15 23:45:32 -07:00
Xinan Miao
2cdf92228c [Feature]: Remove Chunking From FusedMoE (#34086)
Signed-off-by: SouthWest7 <am1ao@qq.com>
Signed-off-by: Southwest <1403572259@qq.com>
Signed-off-by: southwest <am1ao@qq.com>
Signed-off-by: Xinan Miao <1403572259@qq.com>
Co-authored-by: SouthWest7 <am1ao@qq.com>
2026-03-12 14:24:38 -04:00
SoluMilken
85199f9681 [Bugfix] fix main branch pre-commit error (1 line change) (#36897)
Signed-off-by: SoluMilken <ypiheyn.imm02g@g2.nctu.edu.tw>
2026-03-12 09:08:37 -07:00