Kunshang Ji
|
cb9574eb85
|
[XPU][9/N] clean up existing ipex code/doc (#34111)
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
|
2026-02-11 00:27:15 -08:00 |
|
bnellnm
|
d1481ba783
|
[MoE Refactor] Introduce MoERunner abstraction and move execution logic from FusedMoE to DefaultMoERunner (#32344)
Signed-off-by: Bill Nell <bnell@redhat.com>
|
2026-02-10 19:51:07 -05:00 |
|
xuebwang-amd
|
b129136c7a
|
[ROCm][Quantization] GPT_OSS in amd-quark format model loading and emulations (#29008)
Signed-off-by: xuebwang-amd <xuebwang@amd.com>
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
|
2026-02-10 10:08:05 -05:00 |
|
Kunshang Ji
|
7439e4f41b
|
[XPU][4/N] add mxfp4 moe model support (#33679)
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
|
2026-02-06 13:03:59 +08:00 |
|
bnellnm
|
a57c8228ff
|
[Moe Refactor] Make Inplace Flag for FusedMoEModularKernel part of the constructor (#33375)
Signed-off-by: Bill Nell <bnell@redhat.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
|
2026-02-05 18:07:18 +00:00 |
|
杨朱 · Kiki
|
1a7894dbdf
|
[Misc] Replace Optional[X] with X | None syntax (#33332)
Signed-off-by: carlory <baofa.fan@daocloud.io>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
|
2026-01-30 01:56:59 -08:00 |
|
Angela Yi
|
4197168ea5
|
[ez] Remove checks for torch version <= 2.8 (#33209)
Signed-off-by: angelayi <yiangela7@gmail.com>
|
2026-01-28 16:03:56 -05:00 |
|
Matthew Bonanni
|
a608b4c6c2
|
[5/N][Attention] Finish eliminating vllm/attention folder (#32064)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
|
2026-01-27 10:02:51 -05:00 |
|
Pengchao Wang
|
8caffd92df
|
[Bugfix][MXFP4] Call trtllm_fp4_block_scale_moe with kwargs (#33104)
Signed-off-by: Pengchao Wang <wpc@fb.com>
|
2026-01-26 15:13:18 -08:00 |
|
bnellnm
|
dc917cceb8
|
[MoE Refactor] Move select_experts from FusedMoEQuantMethod -> FusedMoE (#31996)
Signed-off-by: Bill Nell <bnell@redhat.com>
|
2026-01-22 18:21:35 -05:00 |
|
elvischenv
|
808d6fd7b9
|
Bump Flashinfer to v0.6.1 (#30993)
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
|
2026-01-21 08:49:50 -08:00 |
|
Robert Shaw
|
42135d6898
|
[MoE Refactor] Oracle Select FP8+NVFP4 Kernels In Priority (#32414)
|
2026-01-21 08:22:33 -05:00 |
|
bnellnm
|
327a02d8db
|
[MoE Refactor] Separate Router into OO Classes (#30623)
Signed-off-by: Bill Nell <bnell@redhat.com>
|
2026-01-18 11:40:49 -05:00 |
|
bnellnm
|
e74698c27a
|
[Misc][Refactor] Add FusedMoERouter object (#30519)
Signed-off-by: Bill Nell <bnell@redhat.com>
|
2026-01-08 20:52:55 +00:00 |
|
BlankR
|
0790f07695
|
[Misc] Improve error messages for unsupported types and parameters (#30593)
Signed-off-by: BlankR <hjyblanche@gmail.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
|
2026-01-07 09:00:16 +00:00 |
|
Vaibhav Sourirajan
|
1f43c121d5
|
Remove unused use_marlin variable in Mxfp4MoEMethod (#31549)
Signed-off-by: vaibhav sourirajan <vs2787@columbia.edu>
|
2026-01-01 21:13:36 -08:00 |
|
baonudesifeizhai
|
54c8924384
|
[MoE Refactor][5/N] Isolate zero expert to LongCatFlash (#28891)
Signed-off-by: baonudesifeizhai <85092850+baonudesifeizhai@users.noreply.github.com>
Signed-off-by: Dongjie Zou <85092850+baonudesifeizhai@users.noreply.github.com>
Signed-off-by: baonudesifeizhai <baonudesifeizhai@gmail.com>
Signed-off-by: Robert Shaw <robertgshaw2@gmail.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <robertgshaw2@gmail.com>
|
2025-12-20 18:22:04 +00:00 |
|
Xin Yang
|
9a5e96523b
|
[LoRA] Set default MXFP4 LoRA backend to Marlin (#30598)
Signed-off-by: Xin Yang <xyangx@amazon.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2025-12-18 08:42:22 -08:00 |
|
Roberto L. Castro
|
4fa7ce46f3
|
[Feature] Add SM103 (Blackwell Ultra) Support to vLLM (#30484)
Signed-off-by: LopezCastroRoberto <robertol.c510@gmail.com>
Signed-off-by: Roberto L. Castro <38211239+LopezCastroRoberto@users.noreply.github.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
|
2025-12-12 19:34:23 -08:00 |
|
bnellnm
|
00e5cbb967
|
[MoE][Refactor] Remove most arguments to FusedMoEMethodBase.apply (#29066)
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
Co-authored-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
|
2025-12-09 13:48:25 -08:00 |
|
Xin Yang
|
a491b0911b
|
[LoRA] Support FusedMoE LoRA Triton kernel for mxfp4 (#29708)
Signed-off-by: Xin Yang <xyangx@amazon.com>
Signed-off-by: Xin Yang <105740670+xyang16@users.noreply.github.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-11-30 10:37:25 +08:00 |
|
Jinzhen Lin
|
1656ad3704
|
[Kernel][Quantization] add w4a8 support for marlin kernel (#24722)
Signed-off-by: Jinzhen Lin <jinzhen.ljz@antgroup.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Michael Goin <mgoin@redhat.com>
|
2025-11-29 07:19:33 -08:00 |
|
Huamin Li
|
3fd1fb0b60
|
Revert "[LoRA] Support FusedMoE LoRA Triton kernel for mxfp4 (#28971)" (#29697)
Signed-off-by: Huamin Li <3ericli@gmail.com>
|
2025-11-28 15:26:52 -08:00 |
|
Xin Yang
|
745a3bae1a
|
[LoRA] Support FusedMoE LoRA Triton kernel for mxfp4 (#28971)
Signed-off-by: Xin Yang <xyangx@amazon.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-11-28 10:48:28 +08:00 |
|
Matthew Bonanni
|
fc1d8be3dc
|
[Attention] Update attention imports (#29540)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
|
2025-11-27 11:19:09 -05:00 |
|
Michael Goin
|
7df0289782
|
Change warning logs to debug for unimplemented MXFP4 Linear/Attention (#29441)
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
|
2025-11-25 22:52:31 +00:00 |
|
Michael Goin
|
c17610e2ba
|
[Bugfix] Only use triton_kernels for MXFP4 on SM90 and SM100 (#29339)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-11-24 18:22:46 -05:00 |
|
bnellnm
|
8f066146c3
|
[MoE][Refactor] Make select_experts a non-static method (#29067)
Signed-off-by: Bill Nell <bnell@redhat.com>
|
2025-11-24 13:38:04 -05:00 |
|
Xin Yang
|
468a8d72ba
|
[Bugfix] Fix FusedMoEModularKernel for triton backend (#28913)
Signed-off-by: Xin Yang <xyangx@amazon.com>
|
2025-11-19 13:05:22 +08:00 |
|
Zhewen Li
|
f8b19c0ffd
|
[Bugfix] Fix GPT-OSS on AMD after #28603 (#28816)
Signed-off-by: zhewenli <zhewenli@meta.com>
|
2025-11-17 13:15:26 -05:00 |
|
Zijing Liu
|
5e973209aa
|
[BugFix] Fix type error when assign a trition kernel tensor to a torch.nn.Parameter (#28603)
Signed-off-by: Zijing Liu <liuzijing2014@gmail.com>
|
2025-11-13 11:30:04 +00:00 |
|
Harry Mellor
|
54aecd9ed5
|
Fix pre-commit (and XPU) on main (#28556)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-11-12 06:13:41 -08:00 |
|
Michael Goin
|
e5f599d4d1
|
[Bugfix] Disable shared expert overlap if Marlin MoE is used (#28410)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-11-11 23:16:12 +00:00 |
|
xuebwang-amd
|
5a1271d83a
|
[Quantization] fix attention quantization of gpt_oss model (#27334)
Signed-off-by: xuebwang-amd <xuebwang@amd.com>
|
2025-11-11 12:06:00 -05:00 |
|
bnellnm
|
a1448b4b69
|
[Kernels] Split up fused_moe/layer.py, isolate more modular kernel code (#28064)
|
2025-11-11 07:29:02 -07:00 |
|
Kunshang Ji
|
1aaecda078
|
[XPU] Enable Expert parallel for MoE models (#28263)
Signed-off-by: Yan Ma <yan.ma@intel.com>
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
|
2025-11-08 00:33:11 +00:00 |
|
smit kadvani
|
11fd69dd54
|
[amd][gptoss] Perf gain because of block alignment (#28024)
Signed-off-by: Smit Kadvani <smit.kadvani@gmail.com>
Co-authored-by: Smit Shaileshbhai Kadvani <kadvani@meta.com>
|
2025-11-07 05:27:42 +00:00 |
|
Kunshang Ji
|
18b39828d9
|
[XPU] Add gpt-oss model support for Intel GPU (#27786)
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
|
2025-11-05 02:17:23 +00:00 |
|
bnellnm
|
938772af03
|
[Kernels] Isolate modular kernel code from FusedMoEMethodBase subclasses. (#27123)
|
2025-11-04 21:59:45 +08:00 |
|
Varun Sundar Rabindranath
|
4022a9d279
|
[BugFix][Performance] Restore flashinfer autotuning for all scenarios (#27904)
|
2025-11-04 15:56:21 +08:00 |
|
Varun Sundar Rabindranath
|
5d3be3ba4c
|
[Bugfix][LoRA][FusedMoE] Select MxFP4 Backend based on LoRA Enablement (#27487)
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
|
2025-10-27 07:32:50 -07:00 |
|
Cyrus Leung
|
7c2bdb83dc
|
[Misc] Clean up utils (#27552)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-10-27 09:05:40 +00:00 |
|
Varun Sundar Rabindranath
|
269c4db0a4
|
[Misc][DP] Guard mxfp4 implementation selection (#27484)
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
|
2025-10-24 23:29:24 +00:00 |
|
fhl2000
|
284cc92275
|
[MISC] cudagraph_capture_sizes related improvements (#26016)
Signed-off-by: fhl <2410591650@qq.com>
Signed-off-by: fhl2000 <63384265+fhl2000@users.noreply.github.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-10-24 05:11:05 -07:00 |
|
dongbo910220
|
3ae082c373
|
[Chore] Separate out optional dependency checks from vllm.utils (#27207)
Signed-off-by: dongbo910220 <1275604947@qq.com>
Signed-off-by: dongbo910220 <32610838+dongbo910220@users.noreply.github.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2025-10-22 10:44:21 -04:00 |
|
Varun Sundar Rabindranath
|
5ff5d94e77
|
[Bugfix] Fix gpt-oss w4a8 DP/EP on B200 (#26729)
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
|
2025-10-21 01:51:14 -04:00 |
|
Isotr0py
|
6ac5e06f7c
|
[Chore] Clean up pytorch helper functions in vllm.utils (#26908)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: isotr0py <2037008807@qq.com>
|
2025-10-18 09:48:22 -07:00 |
|
jiahanc
|
41d3071918
|
[NVIDIA] [Perf] Update to leverage flashinfer trtllm FP4 MOE throughput kernel (#26714)
Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
|
2025-10-16 16:20:25 -07:00 |
|
Varun Sundar Rabindranath
|
fb0571b077
|
[GPTOSS][DP/EP][Marlin] Enable GPTOSS Batched DP/EP using Marlin kernels (#25997)
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
|
2025-10-16 12:53:11 -07:00 |
|
Varun Sundar Rabindranath
|
8ae169286f
|
[torch.compile] Unwrap fused_marlin_moe custom op (#26739)
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
|
2025-10-14 02:22:16 +00:00 |
|