Michael Goin
|
94096a47c9
|
[UX] Separate marlin moe config logic from triton moe (#23006)
|
2025-08-16 22:16:42 -04:00 |
|
bnellnm
|
8ad7285ea2
|
[Kernels] Clean up FusedMoeMethodBase and modular kernel setup. Remove extra arguments from modular kernel methods. (#22035)
Signed-off-by: Bill Nell <bnell@redhat.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
|
2025-08-15 14:46:00 -04:00 |
|
amirkl94
|
b4cef5e6c7
|
refactor: Change scaling factors calculation for flashinfer FusedMoE (#22812)
Signed-off-by: Amir Klein <203507526+amirkl94@users.noreply.github.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
|
2025-08-15 06:19:31 +00:00 |
|
Simon Mo
|
f1f0d2fab8
|
Revert "[Kernel] Add cuda kernel for gpt_oss activation" (#22948)
|
2025-08-14 17:38:10 -07:00 |
|
Jee Jee Li
|
81f4b96481
|
[Kernel] Add cuda kernel for gpt_oss activation (#22538)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-08-14 17:21:29 -07:00 |
|
Chi Zhang
|
98deac3879
|
[FEATURE] support custom vllm tuned config path for fused moe triton kernels (#22791)
Signed-off-by: Chi Zhang <zhangchi.usc1992@bytedance.com>
|
2025-08-13 20:27:25 +08:00 |
|
Wentao Ye
|
f7dcce7a4a
|
[Feature] Add VLLM_USE_DEEP_GEMM_E8M0 Env to Control E8M0 Scale (#21968)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-08-11 09:39:08 -07:00 |
|
Isotr0py
|
7e8d685775
|
[Minor] Fix pre-commit error on main (#22579)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-08-10 00:08:23 -07:00 |
|
Jee Jee Li
|
0c5254b82a
|
[oss] Init gpt-oss bf16 support (#22508)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-08-09 20:19:13 -07:00 |
|
Varun Sundar Rabindranath
|
f703b923f3
|
[Misc] DeepGEMM : Avoid JIT generation in the hot-path (#22215)
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
|
2025-08-08 16:09:59 -07:00 |
|
Wentao Ye
|
d7b28f3415
|
[Log] DeepGEMM Update Log for Unaligned Problem Size (#22208)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-08-04 19:13:19 -07:00 |
|
JartX
|
3654847db5
|
feat: Add Support GPTQ Quantization MOE on ROCM vllm serve (#21733)
|
2025-08-01 21:12:19 -04:00 |
|
amirkl94
|
207b750e19
|
[NVIDIA] Add SM100 Flashinfer MoE per tensor scale fp8 backend (#21458)
Signed-off-by: Amir Klein <203507526+amirkl94@users.noreply.github.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
|
2025-07-31 06:00:01 -07:00 |
|
Isotr0py
|
a4528f0cac
|
[Model]: Fused MoE for nomic-embed-text-v2-moe (#18321)
Signed-off-by: isotr0py <2037008807@qq.com>
Signed-off-by: Isotr0py <2037008807@qq.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-07-29 03:13:27 -07:00 |
|
Kaixi Hou
|
de509ae8eb
|
[NVIDIA] Explicitly disable shuffled weights for flashinfer blockscale moe fp8 kernels (#21411)
Signed-off-by: kaixih <kaixih@nvidia.com>
|
2025-07-26 07:10:36 -07:00 |
|
Kaixi Hou
|
6d0734c562
|
[NVIDIA] Add SM100 Flashinfer MoE blockscale fp8 backend for low latency (#20645)
Signed-off-by: kaixih <kaixih@nvidia.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
|
2025-07-19 02:33:01 -07:00 |
|
Richard Zou
|
b2eb2b5ad7
|
[Kernel] Apply torch.Tag.needs_fixed_stride_order only for torch==2.6.0 (#19346)
Signed-off-by: rzou <zou3519@gmail.com>
|
2025-07-18 14:10:21 -04:00 |
|
Shu Wang
|
c7d8724e78
|
[Core] FlashInfer CUTLASS fused MoE backend (NVFP4) (#20037)
Signed-off-by: shuw <shuw@nvidia.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
|
2025-07-17 21:32:45 -07:00 |
|
Varun Sundar Rabindranath
|
11dfdf21bf
|
[Kernel] DeepGemm MoE : Integrate triton permute / unpermute kernels (#20903)
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
|
2025-07-17 08:10:37 +00:00 |
|
Peter Pan
|
1eb2b9c102
|
[CI] update typos config for CI pre-commit and fix some spells (#20919)
Signed-off-by: Peter Pan <Peter.Pan@daocloud.io>
|
2025-07-15 21:12:40 -07:00 |
|
Varun Sundar Rabindranath
|
c0569dbc82
|
[Misc] ModularKernel : Perform WeightAndReduce inside TritonExperts & DeepGemmExperts (#20725)
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
|
2025-07-14 19:47:16 +00:00 |
|
Wentao Ye
|
e2de455c34
|
[Feature] Integrate SM100 DeepGEMM support (#20087)
|
2025-07-10 20:18:05 -07:00 |
|
Duncan Moss
|
5923ab9524
|
[fix]: disable cutlass block scaled group gemm for EP (#20781)
Signed-off-by: Duncan Moss <djm.moss@gmail.com>
|
2025-07-11 02:39:18 +00:00 |
|
Varun Sundar Rabindranath
|
f0c98cae27
|
[Misc] MoE ModularKernel : Introduce TopKWeightAndReduce (#20648)
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
|
2025-07-10 14:40:38 -07:00 |
|
Varun Sundar Rabindranath
|
805d62ca88
|
[Misc] DP : Add ExpertTokensMetadata (#20332)
Signed-off-by: Varun <vsundarr@redhat.com>
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Varun <vsundarr@redhat.com>
|
2025-07-10 00:33:14 +00:00 |
|
fxmarty-amd
|
332d4cb17b
|
[Feature][Quantization] MXFP4 support for MOE models (#17888)
Signed-off-by: Felix Marty <felmarty@amd.com>
Signed-off-by: Bowen Bao <bowenbao@amd.com>
Signed-off-by: Felix Marty <Felix.Marty@amd.com>
Co-authored-by: Bowen Bao <bowenbao@amd.com>
|
2025-07-09 13:19:02 -07:00 |
|
Duncan Moss
|
97abeb1daa
|
[feat] enable SM100 CUTLASS block scaled group gemm for smaller batch sizes (#20640)
Signed-off-by: Duncan Moss <djm.moss@gmail.com>
|
2025-07-09 11:03:35 +08:00 |
|
Duncan Moss
|
3d184b95b8
|
[feat]: CUTLASS block scaled group gemm for SM100 (#19757)
Signed-off-by: Duncan Moss <djm.moss@gmail.com>
Co-authored-by: Duncan Moss <dmoss@nvidia.com>
|
2025-07-04 12:58:04 -06:00 |
|
bnellnm
|
78fe77534b
|
[Kernel] Enable fp8 support for pplx and BatchedTritonExperts. (#18864)
Signed-off-by: Bill Nell <bnell@redhat.com>
|
2025-07-03 14:55:40 -07:00 |
|
bnellnm
|
c1909e7e8c
|
[Kernels] MoE refactor (#19636)
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: ElizaWszola <ewszola@redhat.com>
Co-authored-by: ElizaWszola <ewszola@redhat.com>
|
2025-07-02 06:08:27 -07:00 |
|
bnellnm
|
015fab8c2f
|
[Kernels][Bugfix] Use torch op for all kernels in FusedMoE forward. Add additional testing for cudagraphs. (#19717)
Signed-off-by: Bill Nell <bnell@redhat.com>
|
2025-06-24 23:22:58 -07:00 |
|
Varun Sundar Rabindranath
|
e3b12667d4
|
[BugFix] : Fix Batched DeepGemm Experts (#19515)
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
|
2025-06-12 20:43:02 -06:00 |
|
bnellnm
|
29fa5cac1c
|
[Kernels] Add activation chunking logic to FusedMoEModularKernel (#19168)
Signed-off-by: Bill Nell <bnell@redhat.com>
|
2025-06-11 12:53:10 -04:00 |
|
ElizaWszola
|
84166fee97
|
[Kernel] Integrate CUTLASS MoE kernel with PPLX (#18762)
Signed-off-by: ElizaWszola <ewszola@redhat.com>
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
|
2025-06-06 18:26:11 -07:00 |
|
Varun Sundar Rabindranath
|
fa98d77773
|
[Kernel] DeepEP dispatch-combine kernel integration (#18434)
Signed-off-by: Varun <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
|
2025-06-03 12:30:02 -07:00 |
|
Simon Mo
|
02f0c7b220
|
[Misc] Add SPDX-FileCopyrightText (#19100)
Signed-off-by: simon-mo <simon.mo@hey.com>
|
2025-06-03 11:20:17 -07:00 |
|
bnellnm
|
f9c069c85e
|
Modularize fused experts and integrate PPLX kernels (#15956)
|
2025-05-14 13:11:54 -07:00 |
|
TJian
|
612c2edb4f
|
[FEAT] [ROCm]: Add AITER CK 2 Stages MoE support (#17110)
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Co-authored-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
|
2025-05-14 03:03:11 -07:00 |
|
Harry Mellor
|
6223dd8114
|
Update deprecated type hinting in model_executor/layers (#18056)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-05-13 04:17:23 -07:00 |
|
Hongxia Yang
|
ed3a1d2106
|
[ROCm] fix num_stages for default moe config to avoid triton OutOfResource error (#17744)
Signed-off-by: Hongxia Yang <hongxia.yang@amd.com>
|
2025-05-07 00:39:48 +00:00 |
|
Mengqing Cao
|
f9bc5a0693
|
[Bugfix] Fix triton import with local TritonPlaceholder (#17446)
Signed-off-by: Mengqing Cao <cmq0113@163.com>
|
2025-05-06 17:53:09 +08:00 |
|
Caleb_Du
|
3e887d2e0c
|
permute/unpermute kernel for moe optimization (#14568)
Signed-off-by: Caleb_Du <Caleb_Du@zju.edu.cn>
|
2025-05-02 11:31:55 -07:00 |
|
kliuae
|
5b794cae8d
|
[ROCm] Add aiter tkw1 kernel for Llama4 fp8 (#16727)
Signed-off-by: kliuae <kuanfu.liu@embeddedllm.com>
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com>
Co-authored-by: vllmellm <vllm.ellm@embeddedllm.com>
|
2025-04-21 20:42:34 -07:00 |
|
Jinzhen Lin
|
d06ba4ed3f
|
[Kernel] moe wna16 marlin kernel (#14447)
Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>
Co-authored-by: Michael Goin <michael@neuralmagic.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
|
2025-04-14 20:05:22 -07:00 |
|
Michael Goin
|
f41647ee6b
|
[Kernel] Support W8A8 channel-wise weights and per-token activations in triton fused_moe_kernel (#16366)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-04-11 17:54:08 +00:00 |
|
TJian
|
2976dc27e9
|
[Bug] [ROCm] Fix Llama 4 Enablement Bug on ROCm: V0 ROCmFlashAttentionImpl and Triton Fused MoE bugs (#16198)
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Signed-off-by: kliuae <kuanfu.liu@embeddedllm.com>
Co-authored-by: Hongxia Yang <hongxia.yang@amd.com>
Co-authored-by: kliuae <kuanfu.liu@embeddedllm.com>
|
2025-04-08 19:12:34 -07:00 |
|
Lu Fang
|
55dcce91df
|
Upstream Llama4 Support to Main (#16113)
Signed-off-by: Aston Zhang <22279212+astonzhang@users.noreply.github.com>
Signed-off-by: Chris Thi <chris.c.thi@gmail.com>
Signed-off-by: drisspg <drisspguessous@gmail.com>
Signed-off-by: Jon Swenson <jmswen@gmail.com>
Signed-off-by: Keyun Tong <tongkeyun@gmail.com>
Signed-off-by: Lu Fang <fanglu@meta.com>
Signed-off-by: Xiaodong Wang <xdwang@meta.com>
Signed-off-by: Yang Chen <yangche@fb.com>
Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com>
Signed-off-by: Yong Hoon Shin <yhshin@meta.com>
Signed-off-by: Zijing Liu <liuzijing2014@gmail.com>
Signed-off-by: Lu Fang <lufang@fb.com>
Signed-off-by: Lu Fang <fanglu@fb.com>
Signed-off-by: Lucia Fang <fanglu@fb.com>
Signed-off-by: Roger Wang <ywang@roblox.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: Lu Fang <fanglu@fb.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-04-07 08:06:27 -07:00 |
|
bnellnm
|
d6fc629f4d
|
[Kernel][Minor] Re-fuse triton moe weight application (#16071)
Signed-off-by: Bill Nell <bnell@redhat.com>
|
2025-04-04 23:27:34 +00:00 |
|
bnellnm
|
15ba07ef25
|
[Minor] Fused experts refactor (#15914)
Signed-off-by: Bill Nell <bnell@redhat.com>
|
2025-04-03 10:19:38 -07:00 |
|
bnellnm
|
e59ca942f5
|
Add option to use DeepGemm contiguous grouped gemm kernel for fused MoE operations. (#13932)
Signed-off-by: Bill Nell <bnell@redhat.com>
|
2025-04-01 12:07:43 -04:00 |
|