Commit Graph

61 Commits

Author SHA1 Message Date
Michael Goin
f72061a19a [UX] More descriptive reasons in is_supported_config for MoE (#34908)
Signed-off-by: mgoin <mgoin64@gmail.com>
2026-02-19 15:20:52 -08:00
Michael Goin
ff1f83b056 [Refactor] Replace activation: str with MoEActivation enum (#33843)
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
2026-02-11 17:29:32 -08:00
bnellnm
d1481ba783 [MoE Refactor] Introduce MoERunner abstraction and move execution logic from FusedMoE to DefaultMoERunner (#32344)
Signed-off-by: Bill Nell <bnell@redhat.com>
2026-02-10 19:51:07 -05:00
TomerBN-Nvidia
995bbf38f1 [Bugfix] Fix shared expert input for latent MoE in EP+DP (Nemotron-H) (#34087)
Signed-off-by: Tomer Natan <tbarnatan@nvidia.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
2026-02-09 16:44:18 +00:00
bnellnm
a57c8228ff [Moe Refactor] Make Inplace Flag for FusedMoEModularKernel part of the constructor (#33375)
Signed-off-by: Bill Nell <bnell@redhat.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
2026-02-05 18:07:18 +00:00
Robert Shaw
5a93b9162b [MoE Refactor] Integrate Naive Prepare Finalize into MK (#32567)
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Signed-off-by: Amir Klein <203507526+amirkl94@users.noreply.github.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: amirkl94 <203507526+amirkl94@users.noreply.github.com>
2026-01-27 01:28:02 +00:00
Robert Shaw
42135d6898 [MoE Refactor] Oracle Select FP8+NVFP4 Kernels In Priority (#32414) 2026-01-21 08:22:33 -05:00
Andrey Khalyavin
ba29ab441e Use the same memory for workspace13 and fused_output. (#31531)
Signed-off-by: Andrey Khalyavin <halyavin@yandex-team.ru>
2026-01-18 19:14:22 +00:00
danielafrimi
3f72639d36 [FIX] Add NO_MUL activation support for modular kernel path (#31528)
Signed-off-by: dafrimi <dafrimi@nvidia.com>
Signed-off-by: <>
Co-authored-by: root <root@gpu-267.slurm-workers-slurm.slurm.svc.cluster.local>
Co-authored-by: root <root@gpu-537.slurm-workers-slurm.slurm.svc.cluster.local>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: root <root@pool0-01777.cm.cluster>
2026-01-12 11:55:49 -05:00
Michael Goin
87e07a6b46 Revert "feat(moe): Add is_act_and_mul=False support for Triton MoE kernels" (#31978) 2026-01-08 11:31:53 -08:00
Rabi Mishra
25eef3dc2e feat(moe): Add is_act_and_mul=False support for Triton MoE kernels (#31645)
Signed-off-by: rabi <ramishra@redhat.com>
2026-01-08 10:27:09 +08:00
Robert Shaw
268b1c55ad [MoE Refactor][13/N] Convert FI to Use PFNoEP (#31533)
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Signed-off-by: Robert Shaw <robertgshaw2@gmail.com>
Signed-off-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
2026-01-03 12:26:36 -08:00
Ming Yang
8372be2828 [moe] Use enable_chunking func (to support disabling chunking) (#29935)
Signed-off-by: Ming Yang <minos.future@gmail.com>
2025-12-18 09:02:38 +00:00
Varun Sundar Rabindranath
e3fc374a9a [BugFix] Workspace allocation during profile run : DeepEPHighThroughput + DeepGEMM (#30899) 2025-12-17 15:00:59 -08:00
Wentao Ye
3778673ea8 [Feat] Refactor for parallel_config in FusedMoEModularKernel (#30282)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
2025-12-15 04:21:36 +00:00
Lucas Wilkinson
3e41992fec [Attention] Use sparse prefill kernel for fp8 kv-cache in DeepSeek-v3.2 (#27532)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2025-12-12 05:57:47 -08:00
czhu-cohere
f6227c22ab [Kernel]Support W4A8 Grouped GEMM on Hopper (#29691)
Signed-off-by: czhu-cohere <conway.zhu@cohere.com>
2025-12-08 19:29:06 -08:00
Wentao Ye
7b5575fa7d [Bug] Fix vLLM config is not set error (#29999)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-12-05 16:42:12 -05:00
Lucas Wilkinson
1840c5cb18 [BugFix] Make sure to allocate worst case MoE workspace during profile run in the DP + EP case (#27426)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2025-11-21 11:41:52 -08:00
Wentao Ye
df44df0143 [Feature] Shared Experts Overlap with FI deepgemm swap kernel, 2.2% throughput improvement and 3.6% TTFT improvement (#28879)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-11-20 18:41:49 -07:00
Andrey Khalyavin
fd4555089a [BugFix] Fix misprint introduced by modular_kernel refactoring. (#28728)
Signed-off-by: Andrey Khalyavin <halyavin@yandex-team.ru>
2025-11-14 10:58:18 -08:00
Varun Sundar Rabindranath
74a9a9faad [Performance][B200] Fix deepgemm prologue (#27897)
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
2025-11-12 13:13:03 -08:00
bnellnm
938772af03 [Kernels] Isolate modular kernel code from FusedMoEMethodBase subclasses. (#27123) 2025-11-04 21:59:45 +08:00
Yeshwanth N
71b1c8b667 [Chore]:Extract math and argparse utilities to separate modules (#27188)
Signed-off-by: Yeshwanth Surya <yeshsurya@gmail.com>
Signed-off-by: Yeshwanth N <yeshsurya@gmail.com>
Signed-off-by: yeshsurya <yeshsurya@gmail.com>
2025-10-26 04:03:32 -07:00
Chen Wu
5f6cbf60d6 [Feature][Kernel]FusedMoE LoRA (#21229)
Signed-off-by: wuchen <cntryroa@gmail.com>
Signed-off-by: banjuede <lmklhc@163.com>
Signed-off-by: Chen Wu <cntryroa@gmail.com>
Signed-off-by: Danielle Robinson <dmmaddix@amazon.com>
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Signed-off-by: bk-201 <joy25810@foxmail.com>
Co-authored-by: wuchen <wuchen@zetyun.com>
Co-authored-by: Nathan Van Gheem <vangheem@gmail.com>
Co-authored-by: banjuede <lmklhc@163.com>
Co-authored-by: Danielle Robinson <dmmaddix@amazon.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: bk-201 <joy25810@foxmail.com>
2025-10-21 03:01:37 +00:00
Wentao Ye
7200a21cd1 [Bug] Fix Assertion error DeepEP/csrc/kernels/intranode.cu:928: 'false and Unsupported type' (#26532)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-13 18:26:37 -04:00
Harry Mellor
8fcaaf6a16 Update Optional[x] -> x | None and Union[x, y] to x | y (#26633)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-10-12 09:51:31 -07:00
Wentao Ye
1ee35382cb [Bug] Fix modular_kernel: ZeroDivisionError: integer division or modulo by zero (#26528)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-09 15:13:27 -07:00
bnellnm
a462331e36 [Bugfix] Disable moe inplace for torch >= 2.9 (#26497)
Signed-off-by: Bill Nell <bnell@redhat.com>
2025-10-09 18:07:38 +00:00
bnellnm
da364615fc [Kernels] Modular kernel refactor (#24812)
Signed-off-by: Bill Nell <bnell@redhat.com>
2025-10-08 17:51:52 -04:00
Harry Mellor
b893d661b1 Fix per file ruff ignores related to simplification (#26259)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-10-05 20:31:53 +00:00
Harry Mellor
4e256cadc2 Remove all references to yapf as it's no longer used (#26251)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-10-05 09:18:11 -07:00
Harry Mellor
d6953beb91 Convert formatting to use ruff instead of yapf + isort (#26247)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-10-05 07:06:22 -07:00
Varun Sundar Rabindranath
7ef40bb983 [GPTOSS][DP/EP][Marlin] Enable GPTOSS DP/EP using Marlin kernels (#25488)
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
2025-10-03 20:13:13 -04:00
Duncan Moss
6160ba4151 feat: BF16 FlashInfer Fused Cutlass MOE for Hopper and Blackwell Expert Parallel (#25503)
Signed-off-by: Duncan Moss <djm.moss@gmail.com>
2025-09-24 18:50:04 -04:00
Lucas Wilkinson
cc1dc7ed6d [Core/DBO][2/N] Dual-Batch Overlap add DeepEP High Throughput support and Prefill support (#24845)
Signed-off-by: Sage Moore <sage@neuralmagic.com>
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
Co-authored-by: Sage Moore <sage@neuralmagic.com>
Co-authored-by: yewentao256 <zhyanwentao@126.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
2025-09-23 16:02:10 +00:00
Varun Sundar Rabindranath
e8db44f883 [DP/EP][GPTOSS] Use triton matmul-ogs kernels for GPTOSS DP/EP (#24588)
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
2025-09-22 21:01:09 -07:00
bnellnm
4bdf400218 [Bugfix] Fix chunked a2_scales in modular kernels (#25264)
Signed-off-by: Bill Nell <bnell@redhat.com>
2025-09-19 19:42:01 +00:00
bnellnm
dc2979c585 [Kernels] Overlap shared experts with combine instead of dispatch (#24254)
Signed-off-by: Bill Nell <bnell@redhat.com>
2025-09-18 12:10:21 +08:00
bnellnm
5963b98b46 [Kernel] Delegate construction of FusedMoEQuantConfig to FusedMoEMethodBase subclasses (#22537)
Signed-off-by: Bill Nell <bnell@redhat.com>
2025-09-17 17:43:31 -06:00
Matthew Bonanni
d119fc8614 [CI][Bugfix] Fix failing Blackwell test (#24993)
Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
2025-09-16 15:55:02 -07:00
Sage Moore
567939953b [Core/DBO][1/N] Add Dual-Batch Overlap mechanism to VLLM (#23693)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Sage Moore <sage@neuralmagic.com>
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Co-authored-by: Lucas Wilkinson <lwilkins@redhat.com>
Co-authored-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
Co-authored-by: yewentao256 <zhyanwentao@126.com>
Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
2025-09-16 12:21:48 -04:00
Didier Durand
f4962a6d55 [Doc]: fix typos in Python comments (#24417)
Signed-off-by: Didier Durand <durand.didier@gmail.com>
2025-09-08 00:22:16 -07:00
bnellnm
e9b92dcd89 [Kernels] Overlap shared experts with send/recv (#23273)
Signed-off-by: Bill Nell <bnell@redhat.com>
2025-09-03 12:35:18 -04:00
bnellnm
8ad7285ea2 [Kernels] Clean up FusedMoeMethodBase and modular kernel setup. Remove extra arguments from modular kernel methods. (#22035)
Signed-off-by: Bill Nell <bnell@redhat.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
2025-08-15 14:46:00 -04:00
Shu Wang
c7d8724e78 [Core] FlashInfer CUTLASS fused MoE backend (NVFP4) (#20037)
Signed-off-by: shuw <shuw@nvidia.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
2025-07-17 21:32:45 -07:00
Varun Sundar Rabindranath
11dfdf21bf [Kernel] DeepGemm MoE : Integrate triton permute / unpermute kernels (#20903)
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
2025-07-17 08:10:37 +00:00
Varun Sundar Rabindranath
c0569dbc82 [Misc] ModularKernel : Perform WeightAndReduce inside TritonExperts & DeepGemmExperts (#20725)
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
2025-07-14 19:47:16 +00:00
Varun Sundar Rabindranath
f0c98cae27 [Misc] MoE ModularKernel : Introduce TopKWeightAndReduce (#20648)
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
2025-07-10 14:40:38 -07:00
Varun Sundar Rabindranath
fdadb6f43a [Bugfix] Fused MoE Modular Kernel chunking loop (#20392)
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
2025-07-10 20:31:10 +00:00