biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Fadi Arafeh	aac0b817fa	[CPU Backend][BugFix] Fix failing CPU MoE test (#32876 ) Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com>	2026-01-23 12:06:51 +00:00
Karan Bansal	fa6e599a61	[Bugfix] Fix _CPU_MOE_ACT AssertionError when vLLM config not set (#32777 ) Signed-off-by: Karan Bansal <karanb192@gmail.com>	2026-01-23 08:22:37 +00:00
bnellnm	dc917cceb8	[MoE Refactor] Move `select_experts` from `FusedMoEQuantMethod` -> `FusedMoE` (#31996 ) Signed-off-by: Bill Nell <bnell@redhat.com>	2026-01-22 18:21:35 -05:00
Richard Zou	654a71fc3c	[torch.compile] Improve Cold Start for MoEs (#32805 ) Signed-off-by: Richard Zou <zou3519@gmail.com>	2026-01-22 10:44:40 -05:00
Alex Sun	49a1262267	[AMD][ROCm] MoRI EP: a high-performance all2all backend (#28664 ) Signed-off-by: Alex Sun <alex.s@amd.com>	2026-01-22 16:33:18 +08:00
Xin Yang	63227accf5	[Kernel] Add topk_sigmoid kernel (#31246 ) Signed-off-by: Xin Yang <xyangx@amazon.com>	2026-01-21 22:49:51 +00:00
elvischenv	808d6fd7b9	Bump Flashinfer to v0.6.1 (#30993 ) Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>	2026-01-21 08:49:50 -08:00
Robert Shaw	42135d6898	[MoE Refactor] Oracle Select FP8+NVFP4 Kernels In Priority (#32414 )	2026-01-21 08:22:33 -05:00
bnellnm	327a02d8db	[MoE Refactor] Separate Router into OO Classes (#30623 ) Signed-off-by: Bill Nell <bnell@redhat.com>	2026-01-18 11:40:49 -05:00
Robert Shaw	4a6af8813f	[MoE Refactor] Move Test Impl into Test Dirs (#32129 ) Signed-off-by: Robert Shaw <rshaw@neuralmagic.com> Co-authored-by: Robert Shaw <rshaw@neuralmagic.com>	2026-01-18 12:16:59 +08:00
TomerBN-Nvidia	c277fbdf31	[Feat] Support non-gated MoE with Marlin, NVFP4 CUTLASS, FP8, INT8, compressed-tensors (#32257 ) Signed-off-by: Tomer Natan <tbarnatan@computelab-frontend-8.nvidia.com> Signed-off-by: mgoin <mgoin64@gmail.com> Co-authored-by: Tomer Natan <tbarnatan@computelab-frontend-8.nvidia.com> Co-authored-by: mgoin <mgoin64@gmail.com> Co-authored-by: Tomer Natan <tbarnatan@ipp1-1429.ipp1a1.colossus.nvidia.com>	2026-01-15 16:15:05 -08:00
Hongxia Yang	048bb59728	AMD CI Test - unskip moe_sum test and moe_align_block_size tests (#32039 ) Signed-off-by: Hongxia Yang <hongxia.yang@amd.com>	2026-01-13 23:25:10 -08:00
Rabi Mishra	69f8a0ea37	fix(rocm): Use refresh_env_variables() for rocm_aiter_ops in test_moe (#31711 ) Signed-off-by: rabi <ramishra@redhat.com>	2026-01-13 19:11:54 +00:00
danielafrimi	3f72639d36	[FIX] Add NO_MUL activation support for modular kernel path (#31528 ) Signed-off-by: dafrimi <dafrimi@nvidia.com> Signed-off-by: <> Co-authored-by: root <root@gpu-267.slurm-workers-slurm.slurm.svc.cluster.local> Co-authored-by: root <root@gpu-537.slurm-workers-slurm.slurm.svc.cluster.local> Co-authored-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: root <root@pool0-01777.cm.cluster>	2026-01-12 11:55:49 -05:00
Runkai Tao	a4d5d663e2	Add unpermute-aware fused MoE path and small-batch fallback (#29354 ) Signed-off-by: Runkai Tao <rt572@physics.rutgers.edu> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>	2026-01-09 12:58:39 -07:00
Lucas Wilkinson	6cdf015c3c	[Misc] Fix `Current vLLM config is not set.` warnings, assert to avoid issues in the future (#31747 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2026-01-08 15:20:49 -08:00
Michael Goin	87e07a6b46	Revert "feat(moe): Add is_act_and_mul=False support for Triton MoE kernels" (#31978 )	2026-01-08 11:31:53 -08:00
Robert Shaw	9f6dcb71ae	[MoE Refactor][16/N] Apply Refactor to NVFP4 (#31692 ) Signed-off-by: Robert Shaw <robshaw@redhat.com> Signed-off-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com> Co-authored-by: Robert Shaw <robshaw@redhat.com> Co-authored-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: Pavani Majety <pmajety@nvidia.com>	2026-01-08 03:46:27 +00:00
Rabi Mishra	25eef3dc2e	feat(moe): Add is_act_and_mul=False support for Triton MoE kernels (#31645 ) Signed-off-by: rabi <ramishra@redhat.com>	2026-01-08 10:27:09 +08:00
Robert Shaw	5dcd7ef1f2	[MoE Refactor][15/N] Apply Refactor to Fp8 (#31415 )	2026-01-07 19:42:33 -05:00
Xin Yang	0ada960a20	[Kernel] Support bias type in grouped_topk kernel (#31781 ) Signed-off-by: Xin Yang <xyangx@amazon.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>	2026-01-07 12:16:32 -08:00
Robert Shaw	af8fd73051	[MoE Refactor][14/N] Clean Up FI Quant Config Smuggling (#31593 ) Signed-off-by: Robert Shaw <robshaw@redhat.com> Co-authored-by: Robert Shaw <robshaw@redhat.com>	2026-01-06 15:47:04 +00:00
wangxiyuan	bb4337b34c	[Platform] Deprecate seed_everything (#31659 ) Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2026-01-04 18:34:04 -08:00
Xinyu Chen	08f425bad1	CustomOp: test forward dispatch for grouped_topk (#31530 ) Signed-off-by: Xinyu Chen <xinyu1.chen@intel.com>	2026-01-02 10:04:01 -05:00
Andreas Karatzas	45c1ca1ca1	[ROCm][CI] Skip DeepGemm-dependent test on ROCm platform (#31462 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2025-12-29 16:31:10 +09:00
Yongye Zhu	7b926e8901	[MoE Refactor][9/N] Use modular kernel for unquantized Triton MoE (#31052 ) Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>	2025-12-22 17:34:19 +00:00
Kevin McKay	ec58c10ce1	[Misc] Fix quantization-related typos (#31116 ) Signed-off-by: c0de128 <kevin.mckay@outlook.com>	2025-12-21 21:13:48 -08:00
Robert Shaw	83a317f650	[MoE Refactor][3/N] Deprecate cutlass block quant fp8 (b200) (#30990 ) Signed-off-by: Robert Shaw <robshaw@redhat.com> Co-authored-by: Robert Shaw <robshaw@redhat.com>	2025-12-19 13:09:54 -08:00
Li, Jiang	e3ab93c896	[CPU] Refactor CPU fused MOE (#30531 ) Signed-off-by: jiang1.li <jiang1.li@intel.com>	2025-12-18 14:36:49 +08:00
Xinyu Chen	3b1d440ede	CustomOp: grouped topk (#29575 ) Signed-off-by: Xinyu Chen <xinyu1.chen@intel.com>	2025-12-17 17:43:00 +08:00
Wentao Ye	3778673ea8	[Feat] Refactor for `parallel_config` in `FusedMoEModularKernel` (#30282 ) Signed-off-by: yewentao256 <zhyanwentao@126.com> Signed-off-by: Robert Shaw <robshaw@redhat.com> Co-authored-by: Robert Shaw <robshaw@redhat.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>	2025-12-15 04:21:36 +00:00
Roberto L. Castro	4fa7ce46f3	[Feature] Add SM103 (Blackwell Ultra) Support to vLLM (#30484 ) Signed-off-by: LopezCastroRoberto <robertol.c510@gmail.com> Signed-off-by: Roberto L. Castro <38211239+LopezCastroRoberto@users.noreply.github.com> Co-authored-by: youkaichao <youkaichao@gmail.com>	2025-12-12 19:34:23 -08:00
Lucas Wilkinson	3e41992fec	[Attention] Use sparse prefill kernel for fp8 kv-cache in DeepSeek-v3.2 (#27532 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2025-12-12 05:57:47 -08:00
Cyrus Leung	5a87d8b9b1	[Deprecation] Remove deprecated plugin and compilation fields for v0.13 release (#30396 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-12-10 19:59:35 -08:00
Jinzhen Lin	879ddb09c3	[Kernel][MoE] optimize `moe_align_block_size` (#29642 ) Signed-off-by: Jinzhen Lin <jinzhen.ljz@antgroup.com> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>	2025-12-07 01:58:47 -08:00
bnellnm	2902c34826	[Kernels] Remove BatchedTritonOrDeepGemmExperts and default fallback to Triton (#29929 ) Signed-off-by: Bill Nell <bnell@redhat.com> Signed-off-by: bnellnm <49004751+bnellnm@users.noreply.github.com> Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>	2025-12-03 20:49:00 +00:00
Varun Sundar Rabindranath	19bee6d12d	[Performance][DP/EP] Add silu_mul_per_token_group_quant_fp8_colmajor kernel (#29470 ) Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com> Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com> Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>	2025-12-03 18:04:59 +00:00
Omer Ullman Argov	39d28108f4	[Feat] Support non-gated activations in NVFP4 modelopt path (#29004 )	2025-11-30 11:02:40 -05:00
Xin Yang	a491b0911b	[LoRA] Support FusedMoE LoRA Triton kernel for mxfp4 (#29708 ) Signed-off-by: Xin Yang <xyangx@amazon.com> Signed-off-by: Xin Yang <105740670+xyang16@users.noreply.github.com> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>	2025-11-30 10:37:25 +08:00
Jinzhen Lin	1656ad3704	[Kernel][Quantization] add w4a8 support for marlin kernel (#24722 ) Signed-off-by: Jinzhen Lin <jinzhen.ljz@antgroup.com> Signed-off-by: Michael Goin <mgoin64@gmail.com> Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com> Co-authored-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: Michael Goin <mgoin@redhat.com>	2025-11-29 07:19:33 -08:00
Huamin Li	3fd1fb0b60	Revert "[LoRA] Support FusedMoE LoRA Triton kernel for mxfp4 (#28971 )" (#29697 ) Signed-off-by: Huamin Li <3ericli@gmail.com>	2025-11-28 15:26:52 -08:00
Xin Yang	745a3bae1a	[LoRA] Support FusedMoE LoRA Triton kernel for mxfp4 (#28971 ) Signed-off-by: Xin Yang <xyangx@amazon.com> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>	2025-11-28 10:48:28 +08:00
bnellnm	8f066146c3	[MoE][Refactor] Make select_experts a non-static method (#29067 ) Signed-off-by: Bill Nell <bnell@redhat.com>	2025-11-24 13:38:04 -05:00
rasmith	fd65015a14	[CI/Build] Only use supported types and features on ROCm in MoE kernel tests (#29149 ) Signed-off-by: Randall Smith <ransmith@amd.com> Co-authored-by: Randall Smith <ransmith@amd.com>	2025-11-21 20:34:33 -07:00
rasmith	322cb02872	[CI/Build][AMD] Fix import errors in tests/kernels/attention (#29032 ) Signed-off-by: Randall Smith <ransmith@amd.com> Co-authored-by: Randall Smith <ransmith@amd.com>	2025-11-20 17:48:09 +08:00
Shu Wang	613abb50d5	[MoE] Nvfp4 Masked Gemm: Add flashinfer grouped_gemm_nt_masked (#25990 ) Signed-off-by: Shu Wang. <shuw@nvidia.com> Signed-off-by: mgoin <mgoin64@gmail.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>	2025-11-19 13:29:06 -08:00
Qiu	2fd893b4ce	[Feature] Prefill Context Parallel (PCP) basic support (#28718 ) Signed-off-by: QiuChunshuo <qiuchunshuo@huawei.com> Signed-off-by: FENP <yuanyongjie.yyj@antgroup.com> Signed-off-by: LookAround <lixushi@huawei.com> Signed-off-by: Jingchun Gao <gaojingchun1@huawei.com> Signed-off-by: zhenwenqi2024 <zhenwenqi_2022@qq.com> Co-authored-by: FENP <yuanyongjie.yyj@antgroup.com> Co-authored-by: LookAround <lixushi@huawei.com> Co-authored-by: Jingchun Gao <gaojingchun1@huawei.com> Co-authored-by: zhenwenqi2024 <zhenwenqi_2022@qq.com> Co-authored-by: Jingchun Gao <63247409+gjc0824@users.noreply.github.com>	2025-11-19 15:52:44 -05:00
Harry Mellor	a8b70304d6	Update `rope_scaling` to `rope_parameters` in preparation for Transformers v5 (#28542 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-11-19 09:06:36 -08:00
amirkl94	03ee48111d	Feature: Support Relu2 in FusedMoE fp8 cutlass path (#27261 )	2025-11-16 13:39:44 -05:00
Cyrus Leung	638e4196d1	[Misc] Make `SchedulerConfig.max_model_len` init-only (#28733 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-11-15 01:59:31 -08:00

1 2 3 4

160 Commits