Wentao Ye
|
0ef7f79054
|
[Perf] Add tuned triton moe config for Qwen3.5 H200, 9.9% E2E throughput improvement (#37340)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2026-03-18 14:18:34 -04:00 |
|
Yan Ma
|
894843eb25
|
replace with torch.cuda.device with with torch.accelerator.device_index (#36144)
Signed-off-by: Yan Ma <yan.ma@intel.com>
|
2026-03-11 23:12:57 -07:00 |
|
Kunshang Ji
|
66a2209645
|
[Hardware] Replace torch.cuda.synchronize() api with torch.accelerator.synchronize (#36085)
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
|
2026-03-05 10:36:39 +00:00 |
|
Kunshang Ji
|
16d2ad1d38
|
[Hardware] Replace torch.cuda.empty_cache with torch.accelerator.empty_cache (#30681)
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
Signed-off-by: Kunshang Ji <jikunshang95@gmail.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-03-04 09:49:47 +00:00 |
|
Robert Shaw
|
97995f6376
|
[MoE Refactor] Create MK for TRTLLM Kernels (#32564)
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>
Signed-off-by: Robert Shaw <robertgshaw2@gmail.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <rshaw@neuralmagic.com>
|
2026-03-03 10:39:50 -08:00 |
|
Matthias Gehre
|
934acddef9
|
[Perf] fused_moe: add int4_w4a16 benchmark support and tuning config (#34130)
Signed-off-by: Matthias Gehre <matthias.gehre@amd.com>
Co-authored-by: TJian <tunjian.tan@embeddedllm.com>
|
2026-02-13 00:14:27 -08:00 |
|
Michael Goin
|
ff1f83b056
|
[Refactor] Replace activation: str with MoEActivation enum (#33843)
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
|
2026-02-11 17:29:32 -08:00 |
|
Matthias Gehre
|
7a048ee65f
|
[Bugfix] Fix benchmark_moe.py inplace assertion with torch >= 2.9 (#34149)
Signed-off-by: Matthias Gehre <matthias.gehre@amd.com>
|
2026-02-11 03:58:56 +00:00 |
|
Jee Jee Li
|
978a37c823
|
[Model] GLM adaptation (#34124)
|
2026-02-09 17:32:52 +08:00 |
|
Dimitrios Bariamis
|
f0bca83ee4
|
Add support for Mistral Large 3 inference with Flashinfer MoE (#33174)
Signed-off-by: Dimitrios Bariamis <12195802+dbari@users.noreply.github.com>
Co-authored-by: Dimitrios Bariamis <12195802+dbari@users.noreply.github.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2026-01-30 22:48:27 -08:00 |
|
danisereb
|
f999539869
|
Add missing import of fused_topk to benchmark_moe (#32784)
Signed-off-by: Daniel Serebrenik <daserebrenik@nvidia.com>
|
2026-01-21 18:30:10 +00:00 |
|
Robert Shaw
|
42135d6898
|
[MoE Refactor] Oracle Select FP8+NVFP4 Kernels In Priority (#32414)
|
2026-01-21 08:22:33 -05:00 |
|
Yuxuan Zhang
|
71832ba71e
|
[GLM-4.7] GLM Model support for GLM-Lite (#31386)
Signed-off-by: zRzRzRzRzRzRzR <2448370773@qq.com>
Signed-off-by: Yuxuan Zhang <2448370773@qq.com>
|
2026-01-19 01:18:38 -08:00 |
|
Robert Shaw
|
5dcd7ef1f2
|
[MoE Refactor][15/N] Apply Refactor to Fp8 (#31415)
|
2026-01-07 19:42:33 -05:00 |
|
Cyrus Leung
|
db318326a5
|
[Misc] Use deprecated for seed_everything (#31780)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-01-06 11:29:55 +00:00 |
|
Alfred
|
a0e9ee83c7
|
[Benchmark] Fix OOM during MoE kernel tuning for large models (#31604)
Signed-off-by: Alfred <massif0601@gmail.com>
|
2026-01-02 22:24:51 +00:00 |
|
Kunshang Ji
|
2a2d5d2780
|
Replace torch.cuda.Event with torch.Event for better hardware compatibility (#26985)
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
|
2025-11-18 11:34:36 -08:00 |
|
Xiake Sun
|
03fa4d3fb3
|
[Hardware][AMD][Model] Add Triton MoE tuning support and optimized configs for Qwen3 omni for MI308X (#28373)
Signed-off-by: Xiake Sun <xiake.sun@amd.com>
Signed-off-by: Xiake Sun <xisun@amd.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-11-10 04:53:40 +00:00 |
|
tomeras91
|
e4ee658672
|
[Model] add optimal triton fused moe configs for NemotronH MoE (#27967)
Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>
|
2025-11-04 12:59:43 +00:00 |
|
Yeshwanth N
|
71b1c8b667
|
[Chore]:Extract math and argparse utilities to separate modules (#27188)
Signed-off-by: Yeshwanth Surya <yeshsurya@gmail.com>
Signed-off-by: Yeshwanth N <yeshsurya@gmail.com>
Signed-off-by: yeshsurya <yeshsurya@gmail.com>
|
2025-10-26 04:03:32 -07:00 |
|
wangxiyuan
|
8f4b313c37
|
[Misc] rename torch_dtype to dtype (#26695)
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
|
2025-10-15 12:11:48 +00:00 |
|
Lukas Geiger
|
338b1bf04f
|
[Benchmarks] Add support for Qwen 3 VL MoE tuning (#26419)
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>
|
2025-10-08 14:01:08 +00:00 |
|
Jee Jee Li
|
67f3fb0844
|
[Bench] Add DeepSeekV32 to MoE benchmark (#25962)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-09-30 14:13:48 -07:00 |
|
bnellnm
|
5963b98b46
|
[Kernel] Delegate construction of FusedMoEQuantConfig to FusedMoEMethodBase subclasses (#22537)
Signed-off-by: Bill Nell <bnell@redhat.com>
|
2025-09-17 17:43:31 -06:00 |
|
Jee Jee Li
|
04ad0dc275
|
[benchmark] Add triton version in the moe tuned config (#24769)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-09-16 14:10:54 +08:00 |
|
Jee Jee Li
|
d11ec124a0
|
[Bench] Add qwen-next in benchmark_moe.py (#24661)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-09-11 21:29:43 +08:00 |
|
Jee Jee Li
|
62f66be1f7
|
[Bugfix] Fix Qwen3-coder moe tuned config (#24072)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-09-07 05:19:46 +00:00 |
|
YUQI.CHENG
|
66548f6603
|
[Bugfix] Fix benchmark_moe.py for blockwise fp8. (#23823)
Signed-off-by: crischeng <420985011@qq.com>
Co-authored-by: cris <grace@guisenbindeMacBook-Pro.local>
|
2025-08-28 21:44:09 +08:00 |
|
Jee Jee Li
|
4d9c61993a
|
[Bugfix] Fix benchmark_moe.py (#23177)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-08-19 13:39:40 +00:00 |
|
Jee Jee Li
|
6d3da472bc
|
[Misc] Add --save-dir option to benchmark_moe (#23020)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-08-16 07:26:10 +00:00 |
|
Jee Jee Li
|
384a052971
|
[Misc] benchmark_moe supports expert parallel (#22251)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-08-11 00:13:27 -07:00 |
|
Jee Jee Li
|
8d705996df
|
[Misc] Minor enhancement of benchmark_moe (#22068)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-08-02 01:35:30 +08:00 |
|
Yuxuan Zhang
|
10eb24cc91
|
GLM-4 Update (#20736)
Signed-off-by: zRzRzRzRzRzRzR <2448370773@qq.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: Lu Fang <fanglu@fb.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Lu Fang <fanglu@fb.com>
|
2025-07-19 22:40:31 +00:00 |
|
Asher
|
5a7fb3ab9e
|
[Model] Add ToolParser and MoE Config for Hunyuan A13B (#20820)
Signed-off-by: Asher Zhang <asherszhang@tencent.com>
|
2025-07-17 09:10:09 +00:00 |
|
Wentao Ye
|
e2de455c34
|
[Feature] Integrate SM100 DeepGEMM support (#20087)
|
2025-07-10 20:18:05 -07:00 |
|
Brayden Zhong
|
cede942b87
|
[Benchmark] Add support for multiple batch size benchmark through CLI in benchmark_moe.py (#20516)
Signed-off-by: Brayden Zhong <b8zhong@uwaterloo.ca>
|
2025-07-06 09:20:11 +00:00 |
|
Wentao Ye
|
a6c4b87fbc
|
Revert "[Feature] Integrate new deepgemm (#19820)" (#20049)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-06-24 19:45:22 -07:00 |
|
Wentao Ye
|
c6e3bba8e6
|
[Feature] Integrate new deepgemm (#19820)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-06-24 12:51:56 -07:00 |
|
Tianyu Guo
|
4589b94032
|
[Bugfix] Fix benchmark_moe.py (#19016)
Signed-off-by: Tianyu Guo <guoty9@mail2.sysu.edu.cn>
|
2025-06-09 18:04:36 -07:00 |
|
Simon Mo
|
02f0c7b220
|
[Misc] Add SPDX-FileCopyrightText (#19100)
Signed-off-by: simon-mo <simon.mo@hey.com>
|
2025-06-03 11:20:17 -07:00 |
|
Harry Mellor
|
009d9e7590
|
Convert benchmarks to ruff format (#18068)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-05-13 13:43:29 +00:00 |
|
xsank
|
0a9bbaa104
|
[Misc] support model prefix & add deepseek vl2 tiny fused moe config (#17763)
Signed-off-by: 唯勤 <xsank.mz@alibaba-inc.com>
Co-authored-by: 唯勤 <xsank.mz@alibaba-inc.com>
|
2025-05-08 07:50:22 +00:00 |
|
Mengqing Cao
|
f9bc5a0693
|
[Bugfix] Fix triton import with local TritonPlaceholder (#17446)
Signed-off-by: Mengqing Cao <cmq0113@163.com>
|
2025-05-06 17:53:09 +08:00 |
|
Xiaodong Wang
|
9352cdb56d
|
[Hardware][AMD] Improve OAM device ID + llama4 Maverick MOE tuning (#16263)
Signed-off-by: Lu Fang <lufang@fb.com>
Co-authored-by: Lu Fang <lufang@fb.com>
|
2025-05-02 19:44:19 +00:00 |
|
Caleb_Du
|
3e887d2e0c
|
permute/unpermute kernel for moe optimization (#14568)
Signed-off-by: Caleb_Du <Caleb_Du@zju.edu.cn>
|
2025-05-02 11:31:55 -07:00 |
|
Michael Goin
|
8fc88d63f1
|
[Model] Add tuned triton fused_moe configs for Qwen3Moe (#17328)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-04-28 15:20:24 -07:00 |
|
Harry Mellor
|
423e9f1cbe
|
Use Transformers helper get_text_config() instead of checking for text_config (#17105)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-04-25 08:47:35 -07:00 |
|
Lu Fang
|
55dcce91df
|
Upstream Llama4 Support to Main (#16113)
Signed-off-by: Aston Zhang <22279212+astonzhang@users.noreply.github.com>
Signed-off-by: Chris Thi <chris.c.thi@gmail.com>
Signed-off-by: drisspg <drisspguessous@gmail.com>
Signed-off-by: Jon Swenson <jmswen@gmail.com>
Signed-off-by: Keyun Tong <tongkeyun@gmail.com>
Signed-off-by: Lu Fang <fanglu@meta.com>
Signed-off-by: Xiaodong Wang <xdwang@meta.com>
Signed-off-by: Yang Chen <yangche@fb.com>
Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com>
Signed-off-by: Yong Hoon Shin <yhshin@meta.com>
Signed-off-by: Zijing Liu <liuzijing2014@gmail.com>
Signed-off-by: Lu Fang <lufang@fb.com>
Signed-off-by: Lu Fang <fanglu@fb.com>
Signed-off-by: Lucia Fang <fanglu@fb.com>
Signed-off-by: Roger Wang <ywang@roblox.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: Lu Fang <fanglu@fb.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-04-07 08:06:27 -07:00 |
|
bnellnm
|
e59ca942f5
|
Add option to use DeepGemm contiguous grouped gemm kernel for fused MoE operations. (#13932)
Signed-off-by: Bill Nell <bnell@redhat.com>
|
2025-04-01 12:07:43 -04:00 |
|
Jee Jee Li
|
a73122de96
|
[Bugfix] fix benchmark moe (#14653)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-03-13 16:12:42 +08:00 |
|