Maral
|
2e9034c998
|
[W8A8 Block Linear Refactor][2/N] Remove W8A8Fp8BlockLinearOp and adopt Fp8 block linear kernel selections. (#33892)
Signed-off-by: maral <maralbahari.98@gmail.com>
Signed-off-by: Maral <maralbahari.98@gmail.com>
|
2026-04-09 08:50:39 +08:00 |
|
Jackmin801
|
a776a48b1c
|
[MoE] Move DEEP_GEMM into experts/ subdirectory (#39005)
Signed-off-by: Jackmin801 <ongjackm@gmail.com>
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
|
2026-04-08 19:23:08 +00:00 |
|
Xin Yang
|
9bd7231106
|
Revert "[Kernel] Add gpt-oss Router GEMM kernel (#37205)" (#38778)
Signed-off-by: Xin Yang <xyangx@amazon.com>
|
2026-04-01 22:02:32 -07:00 |
|
Zhanda Zhu
|
c75a313824
|
[Perf] triton bilinear_pos_embed kernel for ViT (#37948)
Signed-off-by: Zhanda Zhu <zhandazhu@gmail.com>
|
2026-04-01 01:52:02 -07:00 |
|
Liwen
|
171775f306
|
Fix Device Index for ROCm Ray Workers in MoE Benchmark (#38108)
Signed-off-by: Liwen <53441624+li-liwen@users.noreply.github.com>
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
|
2026-03-28 08:27:11 +00:00 |
|
Jee Jee Li
|
2bfbdca23c
|
[Bugfix] Fix benchmark_fused_collective.py (#38082)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2026-03-25 23:51:00 -07:00 |
|
Wentao Ye
|
0ef7f79054
|
[Perf] Add tuned triton moe config for Qwen3.5 H200, 9.9% E2E throughput improvement (#37340)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2026-03-18 14:18:34 -04:00 |
|
Xin Yang
|
b1169d7be8
|
[Kernel] Add gpt-oss Router GEMM kernel (#37205)
Signed-off-by: Xin Yang <xyangx@amazon.com>
|
2026-03-18 08:15:56 -07:00 |
|
Andrey Talman
|
68f783a727
|
[Torch 2.11] Guard torch._C._cpu attribute checks for forward compatibility (#35673)
Signed-off-by: atalman <atalman@fb.com>
|
2026-03-17 18:47:59 +00:00 |
|
Kunshang Ji
|
53ec16a705
|
[Hardware] Replace torch.cuda.device_count/current_device/set_device API (#36145)
Signed-off-by: Kunshang Ji <jikunshang95@gmail.com>
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
|
2026-03-12 07:57:47 -07:00 |
|
Yan Ma
|
894843eb25
|
replace with torch.cuda.device with with torch.accelerator.device_index (#36144)
Signed-off-by: Yan Ma <yan.ma@intel.com>
|
2026-03-11 23:12:57 -07:00 |
|
Roberto L. Castro
|
580864d81e
|
[Attention][Perf][Kernel] Replace torch.cat with vectorized CUDA kernel MLA query concat - DeepSeek-V3.2 (#34917)
Signed-off-by: LopezCastroRoberto <rocastro@redhat.com>
Signed-off-by: Roberto L. Castro <38211239+LopezCastroRoberto@users.noreply.github.com>
|
2026-03-09 09:50:36 -07:00 |
|
Roberto L. Castro
|
2b28b9b269
|
[Attention][Perf] Optimize cp_gather_and_upconvert_fp8_kv_cache - DeepSeek-v3.2 (#35290)
Signed-off-by: LopezCastroRoberto <rocastro@redhat.com>
Co-authored-by: Claude <noreply@anthropic.com>
|
2026-03-09 09:46:57 -07:00 |
|
Jiayi Yan
|
6a895197fa
|
[Bugfix][CI] fix typos (#34934)
Signed-off-by: 1195343015 <1195343015@qq.com>
Signed-off-by: Jiayi Yan <66017932+1195343015@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-03-05 17:05:46 +00:00 |
|
Kunshang Ji
|
66a2209645
|
[Hardware] Replace torch.cuda.synchronize() api with torch.accelerator.synchronize (#36085)
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
|
2026-03-05 10:36:39 +00:00 |
|
Kunshang Ji
|
16d2ad1d38
|
[Hardware] Replace torch.cuda.empty_cache with torch.accelerator.empty_cache (#30681)
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
Signed-off-by: Kunshang Ji <jikunshang95@gmail.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-03-04 09:49:47 +00:00 |
|
Robert Shaw
|
97995f6376
|
[MoE Refactor] Create MK for TRTLLM Kernels (#32564)
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>
Signed-off-by: Robert Shaw <robertgshaw2@gmail.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <rshaw@neuralmagic.com>
|
2026-03-03 10:39:50 -08:00 |
|
Hanjie Qiu
|
71dfce6aa6
|
[Kernel] Refactor FlashInfer allreduce for mnnvl backend (#34109)
Signed-off-by: hjjq <50634613+hjjq@users.noreply.github.com>
Signed-off-by: wzhao18 <wzhao18.sz@gmail.com>
Co-authored-by: wzhao18 <wzhao18.sz@gmail.com>
Co-authored-by: Wei Zhao <51183510+wzhao18@users.noreply.github.com>
|
2026-02-26 03:17:20 +00:00 |
|
Michael Goin
|
22a97e6613
|
[Perf] Improve default triton fused moe configs (#34846)
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
|
2026-02-23 16:01:28 -08:00 |
|
Jee Jee Li
|
7291d1b288
|
[Bugfix] Fix kernel benchmark (#33752)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2026-02-22 21:18:08 -08:00 |
|
Mayank Ketkar
|
648951a9c3
|
[Bugfix] Fix benchmark_fused_collective crash on CustomOp init (#34665)
Signed-off-by: Mayank Ketkar <mketkar@zoox.com>
Signed-off-by: Mayank Ketkar <mayket04@gmail.com>
Co-authored-by: Mayank Ketkar <mketkar@zoox.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
2026-02-19 19:01:00 -05:00 |
|
Matthias Gehre
|
934acddef9
|
[Perf] fused_moe: add int4_w4a16 benchmark support and tuning config (#34130)
Signed-off-by: Matthias Gehre <matthias.gehre@amd.com>
Co-authored-by: TJian <tunjian.tan@embeddedllm.com>
|
2026-02-13 00:14:27 -08:00 |
|
Michael Goin
|
ff1f83b056
|
[Refactor] Replace activation: str with MoEActivation enum (#33843)
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
|
2026-02-11 17:29:32 -08:00 |
|
Matthias Gehre
|
7a048ee65f
|
[Bugfix] Fix benchmark_moe.py inplace assertion with torch >= 2.9 (#34149)
Signed-off-by: Matthias Gehre <matthias.gehre@amd.com>
|
2026-02-11 03:58:56 +00:00 |
|
Mohammad Miadh Angkad
|
d4f123cc48
|
[Kernel] FlashInfer: switch allreduce fusion to unified API (#33985)
Signed-off-by: Mohammad Miadh Angkad <176301910+mmangkad@users.noreply.github.com>
|
2026-02-09 15:43:24 +00:00 |
|
Jee Jee Li
|
978a37c823
|
[Model] GLM adaptation (#34124)
|
2026-02-09 17:32:52 +08:00 |
|
Wentao Ye
|
77c09e1130
|
[Refactor] Remove align block size logic in moe_permute (#33449)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2026-02-06 10:57:06 -08:00 |
|
Runkai Tao
|
7320ca3942
|
Add unpermute-aware fused MoE LoRA path (#32655)
Signed-off-by: Runkai Tao <rt572@physics.rutgers.edu>
|
2026-02-02 09:46:09 +08:00 |
|
Roy Wang
|
68feb76a6f
|
[Misc] Replace deprecated interface seed_everything (#33474)
Signed-off-by: esmeetu <jasonailu87@gmail.com>
|
2026-01-31 05:38:39 -08:00 |
|
Dimitrios Bariamis
|
f0bca83ee4
|
Add support for Mistral Large 3 inference with Flashinfer MoE (#33174)
Signed-off-by: Dimitrios Bariamis <12195802+dbari@users.noreply.github.com>
Co-authored-by: Dimitrios Bariamis <12195802+dbari@users.noreply.github.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2026-01-30 22:48:27 -08:00 |
|
Robert Shaw
|
af9b69f977
|
[Quantization][Deprecation] Remove Marlin 24 (#32688)
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-01-28 15:54:59 +00:00 |
|
Robert Shaw
|
247d1a32ea
|
[Quantization][Deprecation] Remove BitBlas (#32683)
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Signed-off-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
|
2026-01-28 11:06:22 +00:00 |
|
Lifan Shen
|
da8d0c441a
|
[AMD][QWEN3-NEXT] FP8 Tunings (#32042)
Signed-off-by: Lifan Shen <lifans@meta.com>
|
2026-01-27 09:34:13 +00:00 |
|
Robert Shaw
|
5a93b9162b
|
[MoE Refactor] Integrate Naive Prepare Finalize into MK (#32567)
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Signed-off-by: Amir Klein <203507526+amirkl94@users.noreply.github.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: amirkl94 <203507526+amirkl94@users.noreply.github.com>
|
2026-01-27 01:28:02 +00:00 |
|
Wentao Ye
|
8f987883cb
|
[Refactor] Remove unused _moe_permute function (#33108)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2026-01-26 16:06:45 -05:00 |
|
Roberto L. Castro
|
fcb9df99bd
|
[Perf][Kernel] Optimize FP4 quantization kernels (SM100F) (#32520)
Signed-off-by: LopezCastroRoberto <rocastro@redhat.com>
|
2026-01-24 18:45:27 -07:00 |
|
Michael Goin
|
4561f13985
|
[Refactor] Rename gptq_marlin to marlin to match MoE (#32952)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2026-01-23 16:48:12 -05:00 |
|
Wentao Ye
|
dfab5f3764
|
[Bug] Fix benchmark script moe_permute_unpermute (#32949)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2026-01-23 16:18:56 -05:00 |
|
Xin Yang
|
d08b356ee0
|
[Perf] Create TMA-aligned input scale tensor for DeepGemm on Hopper (#32619)
Signed-off-by: Xin Yang <xyangx@amazon.com>
|
2026-01-22 15:47:04 -05:00 |
|
Xin Yang
|
63227accf5
|
[Kernel] Add topk_sigmoid kernel (#31246)
Signed-off-by: Xin Yang <xyangx@amazon.com>
|
2026-01-21 22:49:51 +00:00 |
|
danisereb
|
f999539869
|
Add missing import of fused_topk to benchmark_moe (#32784)
Signed-off-by: Daniel Serebrenik <daserebrenik@nvidia.com>
|
2026-01-21 18:30:10 +00:00 |
|
whx
|
1861ae8aae
|
[PluggableLayer][1/N] Define PluggableLayer (Fix ci) (#32744)
Signed-off-by: whx-sjtu <2952154980@qq.com>
|
2026-01-21 11:38:04 -05:00 |
|
Robert Shaw
|
42135d6898
|
[MoE Refactor] Oracle Select FP8+NVFP4 Kernels In Priority (#32414)
|
2026-01-21 08:22:33 -05:00 |
|
Yuxuan Zhang
|
71832ba71e
|
[GLM-4.7] GLM Model support for GLM-Lite (#31386)
Signed-off-by: zRzRzRzRzRzRzR <2448370773@qq.com>
Signed-off-by: Yuxuan Zhang <2448370773@qq.com>
|
2026-01-19 01:18:38 -08:00 |
|
Andika Rachman
|
5e034f2e3d
|
[cpu][bench] Add Fused MoE Micro Benchmark for CPU Backend (#32092)
Signed-off-by: andikarachman <andika.rachman.y@gmail.com>
|
2026-01-12 10:03:28 +00:00 |
|
Matthew Bonanni
|
2612ba9285
|
[1/N][Attention] Restructure attention: move files (#31916)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
|
2026-01-09 13:10:24 -08:00 |
|
Robert Shaw
|
9f6dcb71ae
|
[MoE Refactor][16/N] Apply Refactor to NVFP4 (#31692)
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Signed-off-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Pavani Majety <pmajety@nvidia.com>
|
2026-01-08 03:46:27 +00:00 |
|
Robert Shaw
|
5dcd7ef1f2
|
[MoE Refactor][15/N] Apply Refactor to Fp8 (#31415)
|
2026-01-07 19:42:33 -05:00 |
|
Cyrus Leung
|
db318326a5
|
[Misc] Use deprecated for seed_everything (#31780)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-01-06 11:29:55 +00:00 |
|
Fadi Arafeh
|
799b5721f6
|
[cpu][bench] Add CPU paged attention benchmarks (#31720)
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com>
|
2026-01-06 10:57:57 +00:00 |
|