Maral
|
2e9034c998
|
[W8A8 Block Linear Refactor][2/N] Remove W8A8Fp8BlockLinearOp and adopt Fp8 block linear kernel selections. (#33892)
Signed-off-by: maral <maralbahari.98@gmail.com>
Signed-off-by: Maral <maralbahari.98@gmail.com>
|
2026-04-09 08:50:39 +08:00 |
|
Jackmin801
|
a776a48b1c
|
[MoE] Move DEEP_GEMM into experts/ subdirectory (#39005)
Signed-off-by: Jackmin801 <ongjackm@gmail.com>
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
|
2026-04-08 19:23:08 +00:00 |
|
Carl Y
|
3bc2734dd0
|
[Kernel] Fuse FP8 output quantization into merge_attn_states (#36518)
Signed-off-by: Carl You <4531192+carlyou@users.noreply.github.com>
|
2026-04-03 01:47:04 +00:00 |
|
Xin Yang
|
9bd7231106
|
Revert "[Kernel] Add gpt-oss Router GEMM kernel (#37205)" (#38778)
Signed-off-by: Xin Yang <xyangx@amazon.com>
|
2026-04-01 22:02:32 -07:00 |
|
Monishver
|
c09ad767cd
|
Feature/silu block quant fusion v1 (#32996)
Signed-off-by: Monishver Chandrasekaran <monishverchandrasekaran@gmail.com>
|
2026-04-01 18:50:43 +00:00 |
|
Zhanda Zhu
|
c75a313824
|
[Perf] triton bilinear_pos_embed kernel for ViT (#37948)
Signed-off-by: Zhanda Zhu <zhandazhu@gmail.com>
|
2026-04-01 01:52:02 -07:00 |
|
whyiug
|
58c959a767
|
[Misc]: clean up non-core lint issues (#37049)
Signed-off-by: whyiug <whyiug@hotmail.com>
|
2026-03-28 10:28:16 -04:00 |
|
Liwen
|
171775f306
|
Fix Device Index for ROCm Ray Workers in MoE Benchmark (#38108)
Signed-off-by: Liwen <53441624+li-liwen@users.noreply.github.com>
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
|
2026-03-28 08:27:11 +00:00 |
|
Jee Jee Li
|
2bfbdca23c
|
[Bugfix] Fix benchmark_fused_collective.py (#38082)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2026-03-25 23:51:00 -07:00 |
|
Harry Mellor
|
d215d1efca
|
[Mypy] Better fixes for the mypy issues in vllm/config (#37902)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-03-25 06:14:43 -07:00 |
|
Kyle Sayers
|
38364a7e32
|
[Sparse24] [Deprecation] Remove Sparse24 CT integration and kernels (#36799)
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
|
2026-03-23 16:03:29 -04:00 |
|
Harry Mellor
|
572b432913
|
Stop bench CLI from recursively casting all configs to dict (#37559)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-03-19 14:04:03 +00:00 |
|
Wentao Ye
|
0ef7f79054
|
[Perf] Add tuned triton moe config for Qwen3.5 H200, 9.9% E2E throughput improvement (#37340)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2026-03-18 14:18:34 -04:00 |
|
Xin Yang
|
b1169d7be8
|
[Kernel] Add gpt-oss Router GEMM kernel (#37205)
Signed-off-by: Xin Yang <xyangx@amazon.com>
|
2026-03-18 08:15:56 -07:00 |
|
Andrey Talman
|
68f783a727
|
[Torch 2.11] Guard torch._C._cpu attribute checks for forward compatibility (#35673)
Signed-off-by: atalman <atalman@fb.com>
|
2026-03-17 18:47:59 +00:00 |
|
Wei Zhao
|
a3a51d20e7
|
[Benchmark] Improvements to attention benchmark script (#37115)
Signed-off-by: wzhao18 <wzhao18.sz@gmail.com>
|
2026-03-16 22:22:40 +00:00 |
|
Kunshang Ji
|
747b068136
|
[Hardware] Replace memory related torch.cuda APIs (#37031)
Signed-off-by: Kunshang Ji <jikunshang95@gmail.com>
|
2026-03-16 10:24:48 +00:00 |
|
Matthew Bonanni
|
f444c05c32
|
[Attention] Use FA4 for MLA prefill (#34732)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
|
2026-03-12 12:10:17 -04:00 |
|
Kunshang Ji
|
53ec16a705
|
[Hardware] Replace torch.cuda.device_count/current_device/set_device API (#36145)
Signed-off-by: Kunshang Ji <jikunshang95@gmail.com>
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
|
2026-03-12 07:57:47 -07:00 |
|
Yan Ma
|
894843eb25
|
replace with torch.cuda.device with with torch.accelerator.device_index (#36144)
Signed-off-by: Yan Ma <yan.ma@intel.com>
|
2026-03-11 23:12:57 -07:00 |
|
Roberto L. Castro
|
580864d81e
|
[Attention][Perf][Kernel] Replace torch.cat with vectorized CUDA kernel MLA query concat - DeepSeek-V3.2 (#34917)
Signed-off-by: LopezCastroRoberto <rocastro@redhat.com>
Signed-off-by: Roberto L. Castro <38211239+LopezCastroRoberto@users.noreply.github.com>
|
2026-03-09 09:50:36 -07:00 |
|
Roberto L. Castro
|
2b28b9b269
|
[Attention][Perf] Optimize cp_gather_and_upconvert_fp8_kv_cache - DeepSeek-v3.2 (#35290)
Signed-off-by: LopezCastroRoberto <rocastro@redhat.com>
Co-authored-by: Claude <noreply@anthropic.com>
|
2026-03-09 09:46:57 -07:00 |
|
Harry Mellor
|
a0f44bb616
|
Allow markdownlint to run locally (#36398)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-03-08 20:05:24 -07:00 |
|
lif
|
00b814ba5a
|
[V0 Deprecation] Remove unused swap_space parameter (#36216)
Signed-off-by: majiayu000 <1835304752@qq.com>
Co-authored-by: mcelrath
|
2026-03-07 22:09:55 +08:00 |
|
Jiayi Yan
|
6a895197fa
|
[Bugfix][CI] fix typos (#34934)
Signed-off-by: 1195343015 <1195343015@qq.com>
Signed-off-by: Jiayi Yan <66017932+1195343015@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-03-05 17:05:46 +00:00 |
|
Kunshang Ji
|
66a2209645
|
[Hardware] Replace torch.cuda.synchronize() api with torch.accelerator.synchronize (#36085)
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
|
2026-03-05 10:36:39 +00:00 |
|
Kunshang Ji
|
16d2ad1d38
|
[Hardware] Replace torch.cuda.empty_cache with torch.accelerator.empty_cache (#30681)
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
Signed-off-by: Kunshang Ji <jikunshang95@gmail.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-03-04 09:49:47 +00:00 |
|
Robert Shaw
|
97995f6376
|
[MoE Refactor] Create MK for TRTLLM Kernels (#32564)
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>
Signed-off-by: Robert Shaw <robertgshaw2@gmail.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <rshaw@neuralmagic.com>
|
2026-03-03 10:39:50 -08:00 |
|
Cyrus Leung
|
792a74b973
|
[Doc] Improve UX of --enable-log-requests (#35723)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-03-02 08:24:09 -08:00 |
|
Wentao Ye
|
05970c772c
|
[Refactor] Remove dead code for attention benchmark script (#35418)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2026-02-26 09:53:46 -08:00 |
|
Wentao Ye
|
05972ea7e5
|
[Refactor] Remove dead or duplicate func utils or variables (#35318)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2026-02-26 10:57:56 -05:00 |
|
Hanjie Qiu
|
71dfce6aa6
|
[Kernel] Refactor FlashInfer allreduce for mnnvl backend (#34109)
Signed-off-by: hjjq <50634613+hjjq@users.noreply.github.com>
Signed-off-by: wzhao18 <wzhao18.sz@gmail.com>
Co-authored-by: wzhao18 <wzhao18.sz@gmail.com>
Co-authored-by: Wei Zhao <51183510+wzhao18@users.noreply.github.com>
|
2026-02-26 03:17:20 +00:00 |
|
Michael Goin
|
22a97e6613
|
[Perf] Improve default triton fused moe configs (#34846)
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
|
2026-02-23 16:01:28 -08:00 |
|
Jee Jee Li
|
7291d1b288
|
[Bugfix] Fix kernel benchmark (#33752)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2026-02-22 21:18:08 -08:00 |
|
Mayank Ketkar
|
648951a9c3
|
[Bugfix] Fix benchmark_fused_collective crash on CustomOp init (#34665)
Signed-off-by: Mayank Ketkar <mketkar@zoox.com>
Signed-off-by: Mayank Ketkar <mayket04@gmail.com>
Co-authored-by: Mayank Ketkar <mketkar@zoox.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
2026-02-19 19:01:00 -05:00 |
|
Jongseok Park
|
c656ba3b4d
|
[Kernel] Triton-based Top-k and Top-p sampler kernels (#33538)
Signed-off-by: js_park <cakeng@naver.com>
Signed-off-by: Jongseok Park <37990712+cakeng@users.noreply.github.com>
Signed-off-by: Sunga Kim <sunga.kim@berkeley.edu>
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Co-authored-by: Sunga Kim <sunga.kim@berkeley.edu>
Co-authored-by: Nick Hill <nickhill123@gmail.com>
|
2026-02-17 23:14:30 +00:00 |
|
junuxyz
|
c61a98f529
|
[CI][BugFix] ShellCheck cleanup to remove baseline and preserve runtime behavior (#34514)
Signed-off-by: junuxyz <216036880+junuxyz@users.noreply.github.com>
|
2026-02-17 12:22:56 +00:00 |
|
Matthias Gehre
|
934acddef9
|
[Perf] fused_moe: add int4_w4a16 benchmark support and tuning config (#34130)
Signed-off-by: Matthias Gehre <matthias.gehre@amd.com>
Co-authored-by: TJian <tunjian.tan@embeddedllm.com>
|
2026-02-13 00:14:27 -08:00 |
|
Matthew Bonanni
|
f2c47886fd
|
[Attention] Add FlashInfer Sparse MLA backend (#33451)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Co-authored-by: Lucas Wilkinson <lwilkins@redhat.com>
Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
|
2026-02-12 17:21:54 +00:00 |
|
Michael Goin
|
ff1f83b056
|
[Refactor] Replace activation: str with MoEActivation enum (#33843)
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
|
2026-02-11 17:29:32 -08:00 |
|
Matthias Gehre
|
7a048ee65f
|
[Bugfix] Fix benchmark_moe.py inplace assertion with torch >= 2.9 (#34149)
Signed-off-by: Matthias Gehre <matthias.gehre@amd.com>
|
2026-02-11 03:58:56 +00:00 |
|
Mohammad Miadh Angkad
|
d4f123cc48
|
[Kernel] FlashInfer: switch allreduce fusion to unified API (#33985)
Signed-off-by: Mohammad Miadh Angkad <176301910+mmangkad@users.noreply.github.com>
|
2026-02-09 15:43:24 +00:00 |
|
Lucas Wilkinson
|
d0d97e2974
|
[Misc] Fix up attention benchmarks (#33810)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Co-authored-by: Matthew Bonanni <mbonanni@redhat.com>
|
2026-02-09 09:42:03 -05:00 |
|
Jee Jee Li
|
978a37c823
|
[Model] GLM adaptation (#34124)
|
2026-02-09 17:32:52 +08:00 |
|
Wentao Ye
|
77c09e1130
|
[Refactor] Remove align block size logic in moe_permute (#33449)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2026-02-06 10:57:06 -08:00 |
|
Eldar Kurtić
|
5c52644b10
|
[Docs] Update link to Benchmark CLI documentation (#33254)
Signed-off-by: Eldar Kurtić <8884008+eldarkurtic@users.noreply.github.com>
|
2026-02-06 16:00:59 +00:00 |
|
Runkai Tao
|
7320ca3942
|
Add unpermute-aware fused MoE LoRA path (#32655)
Signed-off-by: Runkai Tao <rt572@physics.rutgers.edu>
|
2026-02-02 09:46:09 +08:00 |
|
Roy Wang
|
68feb76a6f
|
[Misc] Replace deprecated interface seed_everything (#33474)
Signed-off-by: esmeetu <jasonailu87@gmail.com>
|
2026-01-31 05:38:39 -08:00 |
|
Dimitrios Bariamis
|
f0bca83ee4
|
Add support for Mistral Large 3 inference with Flashinfer MoE (#33174)
Signed-off-by: Dimitrios Bariamis <12195802+dbari@users.noreply.github.com>
Co-authored-by: Dimitrios Bariamis <12195802+dbari@users.noreply.github.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2026-01-30 22:48:27 -08:00 |
|
Robert Shaw
|
af9b69f977
|
[Quantization][Deprecation] Remove Marlin 24 (#32688)
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-01-28 15:54:59 +00:00 |
|