Mayank Ketkar
648951a9c3
[Bugfix] Fix benchmark_fused_collective crash on CustomOp init ( #34665 )
...
Signed-off-by: Mayank Ketkar <mketkar@zoox.com >
Signed-off-by: Mayank Ketkar <mayket04@gmail.com >
Co-authored-by: Mayank Ketkar <mketkar@zoox.com >
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com >
2026-02-19 19:01:00 -05:00
Jongseok Park
c656ba3b4d
[Kernel] Triton-based Top-k and Top-p sampler kernels ( #33538 )
...
Signed-off-by: js_park <cakeng@naver.com >
Signed-off-by: Jongseok Park <37990712+cakeng@users.noreply.github.com >
Signed-off-by: Sunga Kim <sunga.kim@berkeley.edu >
Signed-off-by: Nick Hill <nickhill123@gmail.com >
Co-authored-by: Sunga Kim <sunga.kim@berkeley.edu >
Co-authored-by: Nick Hill <nickhill123@gmail.com >
2026-02-17 23:14:30 +00:00
junuxyz
c61a98f529
[CI][BugFix] ShellCheck cleanup to remove baseline and preserve runtime behavior ( #34514 )
...
Signed-off-by: junuxyz <216036880+junuxyz@users.noreply.github.com >
2026-02-17 12:22:56 +00:00
Matthias Gehre
934acddef9
[Perf] fused_moe: add int4_w4a16 benchmark support and tuning config ( #34130 )
...
Signed-off-by: Matthias Gehre <matthias.gehre@amd.com >
Co-authored-by: TJian <tunjian.tan@embeddedllm.com >
2026-02-13 00:14:27 -08:00
Matthew Bonanni
f2c47886fd
[Attention] Add FlashInfer Sparse MLA backend ( #33451 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
Co-authored-by: Lucas Wilkinson <lwilkins@redhat.com >
Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com >
2026-02-12 17:21:54 +00:00
Michael Goin
ff1f83b056
[Refactor] Replace activation: str with MoEActivation enum ( #33843 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
Signed-off-by: Michael Goin <mgoin64@gmail.com >
2026-02-11 17:29:32 -08:00
Matthias Gehre
7a048ee65f
[Bugfix] Fix benchmark_moe.py inplace assertion with torch >= 2.9 ( #34149 )
...
Signed-off-by: Matthias Gehre <matthias.gehre@amd.com >
2026-02-11 03:58:56 +00:00
Mohammad Miadh Angkad
d4f123cc48
[Kernel] FlashInfer: switch allreduce fusion to unified API ( #33985 )
...
Signed-off-by: Mohammad Miadh Angkad <176301910+mmangkad@users.noreply.github.com >
2026-02-09 15:43:24 +00:00
Lucas Wilkinson
d0d97e2974
[Misc] Fix up attention benchmarks ( #33810 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
Co-authored-by: Matthew Bonanni <mbonanni@redhat.com >
2026-02-09 09:42:03 -05:00
Jee Jee Li
978a37c823
[Model] GLM adaptation ( #34124 )
2026-02-09 17:32:52 +08:00
Wentao Ye
77c09e1130
[Refactor] Remove align block size logic in moe_permute ( #33449 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2026-02-06 10:57:06 -08:00
Eldar Kurtić
5c52644b10
[Docs] Update link to Benchmark CLI documentation ( #33254 )
...
Signed-off-by: Eldar Kurtić <8884008+eldarkurtic@users.noreply.github.com >
2026-02-06 16:00:59 +00:00
Runkai Tao
7320ca3942
Add unpermute-aware fused MoE LoRA path ( #32655 )
...
Signed-off-by: Runkai Tao <rt572@physics.rutgers.edu >
2026-02-02 09:46:09 +08:00
Roy Wang
68feb76a6f
[Misc] Replace deprecated interface seed_everything ( #33474 )
...
Signed-off-by: esmeetu <jasonailu87@gmail.com >
2026-01-31 05:38:39 -08:00
Dimitrios Bariamis
f0bca83ee4
Add support for Mistral Large 3 inference with Flashinfer MoE ( #33174 )
...
Signed-off-by: Dimitrios Bariamis <12195802+dbari@users.noreply.github.com >
Co-authored-by: Dimitrios Bariamis <12195802+dbari@users.noreply.github.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2026-01-30 22:48:27 -08:00
Robert Shaw
af9b69f977
[Quantization][Deprecation] Remove Marlin 24 ( #32688 )
...
Signed-off-by: Robert Shaw <robshaw@redhat.com >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Robert Shaw <robshaw@redhat.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-01-28 15:54:59 +00:00
Robert Shaw
247d1a32ea
[Quantization][Deprecation] Remove BitBlas ( #32683 )
...
Signed-off-by: Robert Shaw <robshaw@redhat.com >
Signed-off-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com >
Co-authored-by: Robert Shaw <robshaw@redhat.com >
2026-01-28 11:06:22 +00:00
Matthew Bonanni
e82fa448c4
Add attention benchmarking tools ( #26835 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
Co-authored-by: Claude <noreply@anthropic.com >
2026-01-28 00:09:20 +00:00
Lifan Shen
da8d0c441a
[AMD][QWEN3-NEXT] FP8 Tunings ( #32042 )
...
Signed-off-by: Lifan Shen <lifans@meta.com >
2026-01-27 09:34:13 +00:00
Robert Shaw
5a93b9162b
[MoE Refactor] Integrate Naive Prepare Finalize into MK ( #32567 )
...
Signed-off-by: Robert Shaw <robshaw@redhat.com >
Signed-off-by: Amir Klein <203507526+amirkl94@users.noreply.github.com >
Co-authored-by: Robert Shaw <robshaw@redhat.com >
Co-authored-by: amirkl94 <203507526+amirkl94@users.noreply.github.com >
2026-01-27 01:28:02 +00:00
Wentao Ye
8f987883cb
[Refactor] Remove unused _moe_permute function ( #33108 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2026-01-26 16:06:45 -05:00
Roberto L. Castro
fcb9df99bd
[Perf][Kernel] Optimize FP4 quantization kernels (SM100F) ( #32520 )
...
Signed-off-by: LopezCastroRoberto <rocastro@redhat.com >
2026-01-24 18:45:27 -07:00
Michael Goin
4561f13985
[Refactor] Rename gptq_marlin to marlin to match MoE ( #32952 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2026-01-23 16:48:12 -05:00
Wentao Ye
dfab5f3764
[Bug] Fix benchmark script moe_permute_unpermute ( #32949 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2026-01-23 16:18:56 -05:00
Xin Yang
d08b356ee0
[Perf] Create TMA-aligned input scale tensor for DeepGemm on Hopper ( #32619 )
...
Signed-off-by: Xin Yang <xyangx@amazon.com >
2026-01-22 15:47:04 -05:00
Xin Yang
63227accf5
[Kernel] Add topk_sigmoid kernel ( #31246 )
...
Signed-off-by: Xin Yang <xyangx@amazon.com >
2026-01-21 22:49:51 +00:00
danisereb
f999539869
Add missing import of fused_topk to benchmark_moe ( #32784 )
...
Signed-off-by: Daniel Serebrenik <daserebrenik@nvidia.com >
2026-01-21 18:30:10 +00:00
whx
1861ae8aae
[PluggableLayer][1/N] Define PluggableLayer (Fix ci) ( #32744 )
...
Signed-off-by: whx-sjtu <2952154980@qq.com >
2026-01-21 11:38:04 -05:00
Robert Shaw
42135d6898
[MoE Refactor] Oracle Select FP8+NVFP4 Kernels In Priority ( #32414 )
2026-01-21 08:22:33 -05:00
Yuxuan Zhang
71832ba71e
[GLM-4.7] GLM Model support for GLM-Lite ( #31386 )
...
Signed-off-by: zRzRzRzRzRzRzR <2448370773@qq.com >
Signed-off-by: Yuxuan Zhang <2448370773@qq.com >
2026-01-19 01:18:38 -08:00
Andika Rachman
5e034f2e3d
[cpu][bench] Add Fused MoE Micro Benchmark for CPU Backend ( #32092 )
...
Signed-off-by: andikarachman <andika.rachman.y@gmail.com >
2026-01-12 10:03:28 +00:00
Matthew Bonanni
2612ba9285
[1/N][Attention] Restructure attention: move files ( #31916 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
2026-01-09 13:10:24 -08:00
Nick Hill
29ce48221c
[Cleanup] Remove obsolete spec decoding compatibility logic ( #32003 )
...
Signed-off-by: Nick Hill <nickhill123@gmail.com >
2026-01-09 05:44:18 +00:00
Robert Shaw
9f6dcb71ae
[MoE Refactor][16/N] Apply Refactor to NVFP4 ( #31692 )
...
Signed-off-by: Robert Shaw <robshaw@redhat.com >
Signed-off-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com >
Co-authored-by: Robert Shaw <robshaw@redhat.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
Co-authored-by: Pavani Majety <pmajety@nvidia.com >
2026-01-08 03:46:27 +00:00
Robert Shaw
5dcd7ef1f2
[MoE Refactor][15/N] Apply Refactor to Fp8 ( #31415 )
2026-01-07 19:42:33 -05:00
BlankR
0790f07695
[Misc] Improve error messages for unsupported types and parameters ( #30593 )
...
Signed-off-by: BlankR <hjyblanche@gmail.com >
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
2026-01-07 09:00:16 +00:00
Cyrus Leung
db318326a5
[Misc] Use deprecated for seed_everything ( #31780 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-01-06 11:29:55 +00:00
Fadi Arafeh
799b5721f6
[cpu][bench] Add CPU paged attention benchmarks ( #31720 )
...
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com >
2026-01-06 10:57:57 +00:00
Michael Goin
e1cd7a5faf
[Bugfix] Add init_workspace_manager to moe kernel benchmarks ( #31042 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2026-01-05 19:14:33 -08:00
Alfred
a0e9ee83c7
[Benchmark] Fix OOM during MoE kernel tuning for large models ( #31604 )
...
Signed-off-by: Alfred <massif0601@gmail.com >
2026-01-02 22:24:51 +00:00
Amir Samani
030fc44914
use the same stream for cuda graph catpure and replay for NCCL ( #29207 )
...
Signed-off-by: Amir Samani <asamani@nvidia.com >
Signed-off-by: youkaichao <youkaichao@gmail.com >
Co-authored-by: youkaichao <youkaichao@gmail.com >
2025-12-25 19:10:03 +08:00
Cyrus Leung
8cef137689
[Chore] Update more locations to use attention_config.backend ( #31153 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-12-22 19:19:50 -08:00
Michael Goin
06d490282f
[NVFP4][Perf] Tune NVFP4 input quant kernel for small batch size ( #30897 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-12-21 09:41:57 -08:00
Michael Goin
0a1ab1e565
[Perf][Kernels] Vectorize csrc/activations_kernels.cu ( #29512 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-12-16 14:56:02 -08:00
Kevin Musgrave
c01d589813
[Benchmarks] auto_tune.sh: Use hostname variable for server requests ( #30529 )
...
Signed-off-by: Kevin Musgrave <kevin.musgrave@gmail.com >
Signed-off-by: Michael Goin <mgoin64@gmail.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-12-15 22:00:29 +00:00
Cyrus Leung
64251f48df
[Chore] Adjust tokenizer import to avoid circular imports ( #30601 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-12-13 04:42:39 -08:00
Harry Mellor
cf3eacfe58
Standardise get_rope to use rope_parameters["partial_rotary_factor"], not rotary_dim ( #30389 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-12-11 20:45:23 +00:00
Cyrus Leung
d917747c95
[Bugfix] Fix task still being passed in tests/benchmarks ( #30476 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-12-11 10:33:55 +00:00
Ming Yang
fba8906930
[perf] Use direct copy (broadcast) instead of cat for k_nope/k_pe in MLA prefill ( #29710 )
...
Signed-off-by: Ming Yang <minos.future@gmail.com >
2025-12-11 08:20:45 +00:00
Cyrus Leung
7e24e5d4d6
[Deprecation] Remove deprecated task, seed and MM settings ( #30397 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-12-10 19:59:39 -08:00