biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Xinan Miao	2cdf92228c	[Feature]: Remove Chunking From FusedMoE (#34086 ) Signed-off-by: SouthWest7 <am1ao@qq.com> Signed-off-by: Southwest <1403572259@qq.com> Signed-off-by: southwest <am1ao@qq.com> Signed-off-by: Xinan Miao <1403572259@qq.com> Co-authored-by: SouthWest7 <am1ao@qq.com>	2026-03-12 14:24:38 -04:00
Robert Shaw	97995f6376	[MoE Refactor] Create MK for TRTLLM Kernels (#32564 ) Signed-off-by: Robert Shaw <robshaw@redhat.com> Signed-off-by: Robert Shaw <rshaw@neuralmagic.com> Signed-off-by: Robert Shaw <robertgshaw2@gmail.com> Co-authored-by: Robert Shaw <robshaw@redhat.com> Co-authored-by: Robert Shaw <rshaw@neuralmagic.com>	2026-03-03 10:39:50 -08:00
Kunshang Ji	8ad54a991b	[Platform] Add current_platform.num_compute_units interface (#35042 ) Signed-off-by: Kunshang Ji <kunshang.ji@intel.com> Signed-off-by: Kunshang Ji <jikunshang95@gmail.com>	2026-02-24 22:22:49 -08:00
Mohammad Miadh Angkad	dd6a6e1190	[Kernel] Add KernelConfig flag to enable/disable FlashInfer autotune (#34006 ) Signed-off-by: Mohammad Miadh Angkad <176301910+mmangkad@users.noreply.github.com> Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2026-02-07 05:24:44 -08:00
emricksini-h	325ab6b0a8	[Feature] OTEL tracing during loading (#31162 )	2026-02-05 16:59:28 -08:00
Robert Shaw	5a93b9162b	[MoE Refactor] Integrate Naive Prepare Finalize into MK (#32567 ) Signed-off-by: Robert Shaw <robshaw@redhat.com> Signed-off-by: Amir Klein <203507526+amirkl94@users.noreply.github.com> Co-authored-by: Robert Shaw <robshaw@redhat.com> Co-authored-by: amirkl94 <203507526+amirkl94@users.noreply.github.com>	2026-01-27 01:28:02 +00:00
Robert Shaw	42135d6898	[MoE Refactor] Oracle Select FP8+NVFP4 Kernels In Priority (#32414 )	2026-01-21 08:22:33 -05:00
tjp_zju	2f03035a61	"refactor: refactor_repeated_interfaces" (#32486 ) Signed-off-by: tom-zju <tanjianpingzju1990@gmail.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>	2026-01-18 22:07:01 +08:00
Matthew Bonanni	4a8412f773	[UX] Reduce DeepGEMM warmup log output to single progress bar (#30903 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2025-12-17 20:21:51 -08:00
Michael Goin	d4d2751732	Update note comment for flashinfer attention warmup (#30711 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-12-16 21:29:03 -08:00
Lucas Wilkinson	aacf0abf8b	[BugFix] Fix `AttributeError: 'MergedColumnParallelLinear' object has no attribute 'weight_scale'` (#30399 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2025-12-10 07:59:23 -08:00
Chenguang Zheng	4ccffe561f	[Core] Encoder separation for Encode-Prefill-Decode Disaggregation (#25233 ) Signed-off-by: n00909098 <nguyen.kha.long@huawei.com> Signed-off-by: knlnguyen1802 <knlnguyen1802@gmail.com> Signed-off-by: herotai214 <herotai214@gmail.com> Signed-off-by: Khuong Le <khuong.le.manh@huawei.com> Signed-off-by: Khuong Le <lemanhkhuong2611@gmail.com> Co-authored-by: n00909098 <nguyen.kha.long@huawei.com> Co-authored-by: knlnguyen1802 <knlnguyen1802@gmail.com> Co-authored-by: herotai214 <herotai214@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Khuong Le <khuong.le.manh@huawei.com> Co-authored-by: Khuong Le <lemanhkhuong2611@gmail.com>	2025-11-11 18:58:33 -08:00
Wentao Ye	de540c0354	[Feature] Add env var `VLLM_MOE_USE_DEEP_GEMM` (#28422 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-11-11 02:29:48 +00:00
bnellnm	938772af03	[Kernels] Isolate modular kernel code from FusedMoEMethodBase subclasses. (#27123 )	2025-11-04 21:59:45 +08:00
Varun Sundar Rabindranath	4022a9d279	[BugFix][Performance] Restore flashinfer autotuning for all scenarios (#27904 )	2025-11-04 15:56:21 +08:00
Varun Sundar Rabindranath	e5e076cad7	[BugFix] Stopgap - Flashinfer Autotuner + GPT-OSS + DP/TP (#27762 ) Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com> Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>	2025-10-30 08:24:31 -07:00
Varun Sundar Rabindranath	5ff5d94e77	[Bugfix] Fix gpt-oss w4a8 DP/EP on B200 (#26729 ) Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com> Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>	2025-10-21 01:51:14 -04:00
Wentao Ye	b3dda72c23	[Feature] Migrate DeepGEMM API from `get_m_alignment_for_contiguous_layout` to `get_mk_alignment_for_contiguous_layout` (#26935 ) Signed-off-by: yewentao256 <zhyanwentao@126.com> Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-10-16 16:46:48 -04:00
Michael Goin	0d21b9b51e	[UX] Speedup DeepGEMM warmup with heuristics (#25619 ) Signed-off-by: mgoin <mgoin64@gmail.com> Signed-off-by: Michael Goin <mgoin64@gmail.com> Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>	2025-10-13 07:59:27 -07:00
Harry Mellor	d6953beb91	Convert formatting to use `ruff` instead of `yapf` + `isort` (#26247 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-10-05 07:06:22 -07:00
Matthew Bonanni	3468f17ebe	[V0 deprecation] Remove _VLLM_V1 suffixes from attention backend names (#25489 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com>	2025-09-25 17:37:50 +00:00
Wentao Ye	88d7bdbd23	[Bug] Fix AttributeError: 'FusedMoE' object has no attribute 'w13_weight_scale'. Did you mean: 'w13_weight_scale_inv' (#25519 ) Signed-off-by: yewentao256 <zhyanwentao@126.com> Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-09-24 00:07:51 +00:00
bnellnm	f11e3c516b	[Kernels] Support blocked fp8 quantization for compressed tensors MoE (#25219 ) Signed-off-by: Bill Nell <bnell@redhat.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>	2025-09-23 16:11:34 +00:00
Lucas Wilkinson	9fac6aa30b	[BugFix] Fix DeepGEMM warmup, no m.weight_scale_inv (#25206 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2025-09-18 14:26:28 -07:00
bnellnm	5963b98b46	[Kernel] Delegate construction of FusedMoEQuantConfig to FusedMoEMethodBase subclasses (#22537 ) Signed-off-by: Bill Nell <bnell@redhat.com>	2025-09-17 17:43:31 -06:00
Lucas Wilkinson	2e6bc46821	[Startup] Make DeepGEMM warmup scale with max-num-batched-tokens (#24693 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2025-09-11 20:10:19 -04:00
Duncan Moss	074854b24f	[Kernel][B200] `mxfp4` fused cutlass moe (#23696 ) Signed-off-by: Duncan Moss <djm.moss@gmail.com> Signed-off-by: Michael Goin <mgoin64@gmail.com> Signed-off-by: mgoin <mgoin64@gmail.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>	2025-09-11 17:04:56 -04:00
Michael Goin	fba7856581	[Perf] Warmup FlashInfer attention during startup (#23439 ) Signed-off-by: mgoin <mgoin64@gmail.com> Signed-off-by: Michael Goin <mgoin64@gmail.com> Signed-off-by: Luka Govedič <lgovedic@redhat.com> Co-authored-by: Luka Govedič <lgovedic@redhat.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by: Matthew Bonanni <mbonanni001@gmail.com>	2025-09-10 15:03:17 -07:00
nvjullin	279a5f31b3	[Kernel] Add nvfp4 gemm flashinfer backends (#22346 ) Signed-off-by: Julien Lin <jullin@nvidia.com> Signed-off-by: mgoin <mgoin64@gmail.com> Co-authored-by: mgoin <mgoin64@gmail.com>	2025-08-14 16:03:55 -04:00
Harry Mellor	00976db0c3	[Docs] Fix warnings in docs build (#22588 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-08-10 05:49:51 -07:00
Varun Sundar Rabindranath	f703b923f3	[Misc] DeepGEMM : Avoid JIT generation in the hot-path (#22215 ) Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com> Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>	2025-08-08 16:09:59 -07:00

31 Commits