biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
jiahanc	34553b9d27	[Performance] Support FP8 flashinfer TRTLLM MOE on Qwen3 and Qwen-3next (#27492 ) Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>	2025-11-10 12:34:57 -05:00
vllmellm	f080a83511	[RFC][ROCm][AITER] Keep all AITER kernels in `_aiter_ops` class like `_custom_ops` and `_ipex_ops` (#24490 ) Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2025-11-10 08:20:53 -08:00
Kunshang Ji	1aaecda078	[XPU] Enable Expert parallel for MoE models (#28263 ) Signed-off-by: Yan Ma <yan.ma@intel.com> Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>	2025-11-08 00:33:11 +00:00
Wentao Ye	d71af5f502	[Feature] Enable TP + EP `shared_experts` overlap with router, 3.7% E2E performance improvement (#28164 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-11-05 17:21:08 -08:00
Frost Mitchell	6e97eccf5d	[XPU] Enable custom routing functions in IPEX for Llama4 (#28004 ) Signed-off-by: frost-intel <frost.mitchell@intel.com>	2025-11-05 13:39:57 +00:00
bnellnm	938772af03	[Kernels] Isolate modular kernel code from FusedMoEMethodBase subclasses. (#27123 )	2025-11-04 21:59:45 +08:00
Tyler Michael Smith	3758757377	[Bugfix] Fix MoE Routing Simulation (#28002 ) Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>	2025-11-03 22:26:49 +00:00
Jee Jee Li	bc4486d609	[Kernel] Enable FusedMoEModularKernel support bias (#27754 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-11-01 02:05:12 +00:00
Wentao Ye	fcb1d570bb	[Bug] Fix DeepEP low latency `assert self.batched_router_logits.size(-1) == full_router_logits.size(-1)` Bug (#27682 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-10-29 14:50:39 -04:00
Wentao Ye	0484b64248	[Bug] Fix shape issue for eplb expert weights (#27589 ) Signed-off-by: yewentao256 <zhyanwentao@126.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2025-10-28 20:44:05 +08:00
Varun Sundar Rabindranath	5d3be3ba4c	[Bugfix][LoRA][FusedMoE] Select MxFP4 Backend based on LoRA Enablement (#27487 ) Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com> Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>	2025-10-27 07:32:50 -07:00
Yeshwanth N	71b1c8b667	[Chore]:Extract math and argparse utilities to separate modules (#27188 ) Signed-off-by: Yeshwanth Surya <yeshsurya@gmail.com> Signed-off-by: Yeshwanth N <yeshsurya@gmail.com> Signed-off-by: yeshsurya <yeshsurya@gmail.com>	2025-10-26 04:03:32 -07:00
Wentao Ye	52efc34ebf	[Log] Optimize Startup Log (#26740 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-10-24 19:27:04 -04:00
Alexander Matveev	9ef3d5b875	[Bugfix] Fix dp_chunking enablement logic in FusedMoE layer (#27220 ) Signed-off-by: Alexander Matveev <amatveev@redhat.com>	2025-10-24 00:03:14 +08:00
tomeras91	61089465a6	[Model] Add MoE support for NemotronH (#25863 ) Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>	2025-10-23 10:27:23 +00:00
dongbo910220	3ae082c373	[Chore] Separate out optional dependency checks from vllm.utils (#27207 ) Signed-off-by: dongbo910220 <1275604947@qq.com> Signed-off-by: dongbo910220 <32610838+dongbo910220@users.noreply.github.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2025-10-22 10:44:21 -04:00
Alexander Matveev	344a0017c0	[Performance] Dual stream execution of "shared_experts" and "selected_experts" inside FusedMoE (#26440 ) Signed-off-by: Alexander Matveev <amatveev@redhat.com>	2025-10-21 21:38:29 +00:00
Shu Wang	f95da13c3d	[ModelOpt] Load w13/w2_input_scale for all experts, nvfp4 (#26135 ) Signed-off-by: Shu Wang <shuw@nvidia.com> Signed-off-by: Shu Wang. <shuw@nvidia.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>	2025-10-21 01:50:31 -04:00
Isotr0py	6ac5e06f7c	[Chore] Clean up pytorch helper functions in `vllm.utils` (#26908 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Signed-off-by: isotr0py <2037008807@qq.com>	2025-10-18 09:48:22 -07:00
Varun Sundar Rabindranath	fb0571b077	[GPTOSS][DP/EP][Marlin] Enable GPTOSS Batched DP/EP using Marlin kernels (#25997 ) Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com> Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>	2025-10-16 12:53:11 -07:00
kliuae	1317034379	[ROCm][FEAT] Fuse DeepSeek shared experts into AITER fused_moe ops (#24097 ) Signed-off-by: chenjun <junchen2@amd.com> Signed-off-by: kliuae <kuanfu.liu@embeddedllm.com> Co-authored-by: valarLip <103567126+valarLip@users.noreply.github.com> Co-authored-by: TJian <tunjian.tan@embeddedllm.com>	2025-10-16 10:41:34 +08:00
Harry Mellor	8fcaaf6a16	Update `Optional[x]` -> `x \| None` and `Union[x, y]` to `x \| y` (#26633 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-10-12 09:51:31 -07:00
bnellnm	da364615fc	[Kernels] Modular kernel refactor (#24812 ) Signed-off-by: Bill Nell <bnell@redhat.com>	2025-10-08 17:51:52 -04:00
Benjamin Chislett	2161efe978	[Bugfix] Allow skipping MoE in NVFP4 (fix for MTP) (#25987 ) Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>	2025-10-06 16:16:30 -04:00
Harry Mellor	b893d661b1	Fix per file ruff ignores related to simplification (#26259 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-10-05 20:31:53 +00:00
Harry Mellor	4e256cadc2	Remove all references to `yapf` as it's no longer used (#26251 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-10-05 09:18:11 -07:00
Harry Mellor	d6953beb91	Convert formatting to use `ruff` instead of `yapf` + `isort` (#26247 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-10-05 07:06:22 -07:00
Harry Mellor	10d765482d	`FusedMoE` support for the Transformers backend (#22650 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-10-02 23:12:15 -07:00
Matthew Bonanni	13cdc02173	Fix MTP with deepep_low_latency (#25904 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2025-10-02 21:29:49 +00:00
Tyler Michael Smith	a5354b3ed2	[Bugfix][WideEP] Apply TP Attn + EP MoE fix to other models (#24982 ) Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>	2025-09-27 14:22:28 +00:00
Sage Moore	dfb9af2014	[Bugfix] Fix Shared Expert/Zero expert code in FusedMoE.process_chunk (#25698 ) Signed-off-by: Sage Moore <sage@neuralmagic.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>	2025-09-26 01:25:28 -07:00
XuruiYang	845adb3ec6	[Model] Add LongCat-Flash (#23991 ) Signed-off-by: yangxurui <yangxurui@meituan.com> Co-authored-by: yangxurui <yangxurui@meituan.com>	2025-09-24 21:53:40 -07:00
Duncan Moss	6160ba4151	feat: BF16 FlashInfer Fused Cutlass MOE for Hopper and Blackwell Expert Parallel (#25503 ) Signed-off-by: Duncan Moss <djm.moss@gmail.com>	2025-09-24 18:50:04 -04:00
Harry Mellor	8c853050e7	[Docs] Enable `fail_on_warning` for the docs build in CI (#25580 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-09-24 19:30:33 +00:00
Nikhil Gupta	359d293006	[fix]: add Arm 4bit fused moe support (#23809 ) Signed-off-by: Nikhil Gupta <nikhil.gupta2@arm.com>	2025-09-24 01:32:22 +00:00
Michael Goin	7361ab379f	Remove redundant mutates_args and dispatch_key for direct_register_custom_op (#25512 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-09-23 22:48:40 +00:00
Joel	61d1b35561	[BugFix] Register expert_map as named buffer for wake_up and sleep (#25458 ) Signed-off-by: wuxibin <wuxibin@bytedance.com> Signed-off-by: youkaichao <youkaichao@gmail.com> Co-authored-by: youkaichao <youkaichao@gmail.com>	2025-09-23 21:49:13 +08:00
Varun Sundar Rabindranath	e8db44f883	[DP/EP][GPTOSS] Use triton matmul-ogs kernels for GPTOSS DP/EP (#24588 ) Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com> Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>	2025-09-22 21:01:09 -07:00
Bowen Wang	06a41334c7	[EPLB] Reduce EPLB Inference Overhead (#24573 ) Signed-off-by: Bowen Wang <abmfy@icloud.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>	2025-09-22 16:31:05 +00:00
YiwenC	9d8a2d86d2	[EPLB] Add EPLB support for hunyuan_v1 (#23078 )	2025-09-18 04:51:35 +00:00
bnellnm	5963b98b46	[Kernel] Delegate construction of FusedMoEQuantConfig to FusedMoEMethodBase subclasses (#22537 ) Signed-off-by: Bill Nell <bnell@redhat.com>	2025-09-17 17:43:31 -06:00
Sage Moore	567939953b	[Core/DBO][1/N] Add Dual-Batch Overlap mechanism to VLLM (#23693 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by: Sage Moore <sage@neuralmagic.com> Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com> Signed-off-by: yewentao256 <zhyanwentao@126.com> Co-authored-by: Lucas Wilkinson <lwilkins@redhat.com> Co-authored-by: Lucas Wilkinson <lwilkinson@neuralmagic.com> Co-authored-by: yewentao256 <zhyanwentao@126.com> Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>	2025-09-16 12:21:48 -04:00
Chen Bruce	7ea5c73ad7	[Feat][EPLB] A novel static EPLB placement strategy for MoE models. (#23745 ) Signed-off-by: bruceszchen <bruceszchen@tencent.com> Signed-off-by: Chen Bruce <bruceszchen@tencent.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Signed-off-by: Chen Bruce <cszwwdz@vip.qq.com> Co-authored-by: lemon412 <lemon412@foxmail.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-09-16 10:55:16 +00:00
Duncan Moss	074854b24f	[Kernel][B200] `mxfp4` fused cutlass moe (#23696 ) Signed-off-by: Duncan Moss <djm.moss@gmail.com> Signed-off-by: Michael Goin <mgoin64@gmail.com> Signed-off-by: mgoin <mgoin64@gmail.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>	2025-09-11 17:04:56 -04:00
Konrad Zawora	4aa23892d6	[Bugfix] Fix platform-specific routing in CustomOp implementations (#24444 ) Signed-off-by: Konrad Zawora <kzawora@habana.ai>	2025-09-11 17:15:01 +00:00
Hyogeun Oh (오효근)	ccee371e86	[Docs] Fix warnings in `mkdocs build` (continued) (#24092 ) Signed-off-by: Zerohertz <ohg3417@gmail.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>	2025-09-10 06:23:28 -07:00
bnellnm	b23fb78623	[Bugfix] Fix for 24530. Fix naive all2all shared expert overlap. (#24538 )	2025-09-09 17:53:53 -07:00
Tyler Michael Smith	955c624915	[Bugfix][Wide EP] Fix redundant work when using DeepEP, TP Attn, and EP MoE (#24134 ) Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>	2025-09-08 19:01:51 -07:00
Didier Durand	35bf193864	[Doc]: fix typos in Python comments (#24294 ) Signed-off-by: Didier Durand <durand.didier@gmail.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>	2025-09-05 19:41:12 -07:00
bnellnm	e9b92dcd89	[Kernels] Overlap shared experts with send/recv (#23273 ) Signed-off-by: Bill Nell <bnell@redhat.com>	2025-09-03 12:35:18 -04:00

1 2 3 4

164 Commits