Tyler Michael Smith
|
68aaeb3749
|
[EP+DP] Optimize the little operations in the DeepGEMM + DeepEP low latency case (#19885)
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Signed-off-by: Tyler Michael Smith <tysmith@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
|
2025-06-23 11:07:47 -07:00 |
|
Wentao Ye
|
ffb2cd6b54
|
[Perf] Optimize moe_align_block_size CUDA kernel (#19572)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
|
2025-06-17 11:49:26 -07:00 |
|
bnellnm
|
29fa5cac1c
|
[Kernels] Add activation chunking logic to FusedMoEModularKernel (#19168)
Signed-off-by: Bill Nell <bnell@redhat.com>
|
2025-06-11 12:53:10 -04:00 |
|
Varun Sundar Rabindranath
|
5cf2daea9a
|
[Misc] Fixes and Optimizations for DeepEP + DeepGEMM combination. (#19298)
Signed-off-by: Varun <vsundarr@redhat.com>
Co-authored-by: Varun <vsundarr@redhat.com>
|
2025-06-09 10:50:39 -04:00 |
|
ElizaWszola
|
84166fee97
|
[Kernel] Integrate CUTLASS MoE kernel with PPLX (#18762)
Signed-off-by: ElizaWszola <ewszola@redhat.com>
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
|
2025-06-06 18:26:11 -07:00 |
|
Chiyue Wei
|
61059bee40
|
[Hardware][NVIDIA] FP4 MoE kernel optimization (#19110)
Signed-off-by: Chiyue Wei <chiyuew@nvidia.com>
Co-authored-by: Chiyue Wei <chiyuew@nvidia.com>
|
2025-06-05 09:48:26 -07:00 |
|
Varun Sundar Rabindranath
|
c3fd4d669a
|
[Kernel] Integrate batched/masked deepgemm kernel (#19111)
Signed-off-by: Varun <vsundarr@redhat.com>
Co-authored-by: Varun <vsundarr@redhat.com>
|
2025-06-04 21:59:18 +00:00 |
|
Varun Sundar Rabindranath
|
fa98d77773
|
[Kernel] DeepEP dispatch-combine kernel integration (#18434)
Signed-off-by: Varun <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
|
2025-06-03 12:30:02 -07:00 |
|
Simon Mo
|
02f0c7b220
|
[Misc] Add SPDX-FileCopyrightText (#19100)
Signed-off-by: simon-mo <simon.mo@hey.com>
|
2025-06-03 11:20:17 -07:00 |
|
vllmellm
|
0f5e0d567e
|
[FEAT][ROCm] Add AITER grouped topk for DeepSeekV2 (#18825)
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
|
2025-05-31 03:39:31 -07:00 |
|
Tyler Michael Smith
|
6e588da0f4
|
[Build/CI] Fix CUDA 11.8 build (#17679)
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
Signed-off-by: Tyler Michael Smith <tysmith@redhat.com>
Co-authored-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
|
2025-05-22 12:13:54 -07:00 |
|
bnellnm
|
c6c10ca920
|
[Bugfix] Reduce moe_sum test size to avoid OOM (#18484)
Signed-off-by: Bill Nell <bnell@redhat.com>
|
2025-05-21 06:46:39 -07:00 |
|
bnellnm
|
92247c522e
|
[Bug] Fix moe_sum signature (#18440)
Signed-off-by: Bill Nell <bnell@redhat.com>
|
2025-05-20 22:37:08 -07:00 |
|
bnellnm
|
f9c069c85e
|
Modularize fused experts and integrate PPLX kernels (#15956)
|
2025-05-14 13:11:54 -07:00 |
|
TJian
|
612c2edb4f
|
[FEAT] [ROCm]: Add AITER CK 2 Stages MoE support (#17110)
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Co-authored-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
|
2025-05-14 03:03:11 -07:00 |
|
vllmellm
|
2d912fb66f
|
[FEAT] [ROCm] [V1]: Add AITER biased group topk for DeepSeekV3 (#17955)
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com>
|
2025-05-13 22:03:47 -07:00 |
|
Jinzhen Lin
|
d74e5f37bc
|
[Kernel] fp4 marlin kernel (#17687)
Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>
|
2025-05-10 19:58:49 -07:00 |
|
Pavani Majety
|
0c0fdae84f
|
[Hardware/NVIDIA/Kernel] Enable nvidia/DeepSeek-R1-FP4 Model (#16362)
|
2025-05-09 16:24:41 -07:00 |
|
Cyrus Leung
|
6e5595ca39
|
[CI/Build] Automatically retry flaky tests (#17856)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-05-09 09:55:17 -06:00 |
|
Jinzhen Lin
|
1d0c9d6b2d
|
[Kernel] some optimizations for dense marlin and moe marlin (#16850)
Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>
|
2025-05-05 09:39:30 -07:00 |
|
Caleb_Du
|
3e887d2e0c
|
permute/unpermute kernel for moe optimization (#14568)
Signed-off-by: Caleb_Du <Caleb_Du@zju.edu.cn>
|
2025-05-02 11:31:55 -07:00 |
|
Michael Goin
|
6317a5174a
|
Categorize tests/kernels/ based on kernel type (#16799)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-04-23 09:21:07 -04:00 |
|