Jinzhen Lin
879ddb09c3
[Kernel][MoE] optimize moe_align_block_size ( #29642 )
...
Signed-off-by: Jinzhen Lin <jinzhen.ljz@antgroup.com >
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com >
2025-12-07 01:58:47 -08:00
Wentao Ye
a42ab317ac
[Log] Optimize startup log ( #28948 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
Co-authored-by: Nick Hill <nhill@redhat.com >
2025-11-21 08:46:20 -08:00
Wentao Ye
5031cd5d55
[Refactor] Optimize select_experts ( #28069 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-11-19 18:53:15 -05:00
vllmellm
f080a83511
[RFC][ROCm][AITER] Keep all AITER kernels in _aiter_ops class like _custom_ops and _ipex_ops ( #24490 )
...
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com >
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
2025-11-10 08:20:53 -08:00
Robert Shaw
26990d25dc
[Bugfix] Update device name for H200 detection ( #28349 )
...
Signed-off-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com >
2025-11-08 19:01:11 +00:00
Michael Goin
0852527647
[Perf][DeepSeek] Add sigmoid+bias fusion to fused_grouped_topk from TRTLLM ( #28124 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com >
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
2025-11-07 18:20:55 -08:00
Roger Meier
2918c1b49c
[Model] Use the same fused_moe configs for all H200 devices ( #23642 )
...
Signed-off-by: Roger Meier <r.meier@siemens.com >
2025-10-30 17:36:56 +00:00
Danielle Robinson
9932ed6a83
[Kernel] Adding split_K implementation for fused_moe_lora ( #27291 )
...
Signed-off-by: Danielle Robinson <dmmaddix@amazon.com >
Signed-off-by: Danielle Robinson <dcmaddix@gmail.com >
Co-authored-by: Danielle Robinson <dmmaddix@amazon.com >
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com >
2025-10-27 02:05:24 -07:00
tomeras91
61089465a6
[Model] Add MoE support for NemotronH ( #25863 )
...
Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com >
2025-10-23 10:27:23 +00:00
Chen Wu
5f6cbf60d6
[Feature][Kernel]FusedMoE LoRA ( #21229 )
...
Signed-off-by: wuchen <cntryroa@gmail.com >
Signed-off-by: banjuede <lmklhc@163.com >
Signed-off-by: Chen Wu <cntryroa@gmail.com >
Signed-off-by: Danielle Robinson <dmmaddix@amazon.com >
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
Signed-off-by: bk-201 <joy25810@foxmail.com >
Co-authored-by: wuchen <wuchen@zetyun.com >
Co-authored-by: Nathan Van Gheem <vangheem@gmail.com >
Co-authored-by: banjuede <lmklhc@163.com >
Co-authored-by: Danielle Robinson <dmmaddix@amazon.com >
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com >
Co-authored-by: bk-201 <joy25810@foxmail.com >
2025-10-21 03:01:37 +00:00
Isotr0py
6ac5e06f7c
[Chore] Clean up pytorch helper functions in vllm.utils ( #26908 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Signed-off-by: isotr0py <2037008807@qq.com >
2025-10-18 09:48:22 -07:00
zhrrr
75c7ad9918
[Kernel][Performance] Fuse float cast and renormalize to topk softmax kernel ( #26717 )
...
Signed-off-by: zhuhaoran <zhuhaoran.zhr@alibaba-inc.com >
Signed-off-by: izhuhaoran <izhuhaoran@qq.com >
2025-10-17 07:30:35 +00:00
Boyuan Feng
08405609cc
disable graph partition in custom op ( #26952 )
...
Signed-off-by: Boyuan Feng <boyuan@meta.com >
Signed-off-by: Boyuan Feng <fby.1994@gmail.com >
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
2025-10-17 11:08:47 +08:00
Bram Wasti
b2f78cbad4
[small][batch invariance] Rename the env and internal flags to simplify usage ( #26855 )
...
Signed-off-by: Bram Wasti <bwasti@meta.com >
2025-10-16 21:40:25 +00:00
Bram Wasti
7d8975de84
Deepseek-v3 Batch Invariant on 8xH100 ( #26609 )
...
Signed-off-by: Bram Wasti <bwasti@meta.com >
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
2025-10-15 22:06:02 -07:00
Harry Mellor
8fcaaf6a16
Update Optional[x] -> x | None and Union[x, y] to x | y ( #26633 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-10-12 09:51:31 -07:00
bnellnm
a462331e36
[Bugfix] Disable moe inplace for torch >= 2.9 ( #26497 )
...
Signed-off-by: Bill Nell <bnell@redhat.com >
2025-10-09 18:07:38 +00:00
bnellnm
da364615fc
[Kernels] Modular kernel refactor ( #24812 )
...
Signed-off-by: Bill Nell <bnell@redhat.com >
2025-10-08 17:51:52 -04:00
fxmarty-amd
41f1cf38f2
[Feature][OCP MX] Support mxfp6 and mixed mxfp6-mxfp4 ( #21166 )
2025-10-07 09:35:26 -04:00
Harry Mellor
b893d661b1
Fix per file ruff ignores related to simplification ( #26259 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-10-05 20:31:53 +00:00
Harry Mellor
4e256cadc2
Remove all references to yapf as it's no longer used ( #26251 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-10-05 09:18:11 -07:00
Harry Mellor
d6953beb91
Convert formatting to use ruff instead of yapf + isort ( #26247 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-10-05 07:06:22 -07:00
Varun Sundar Rabindranath
7ef40bb983
[GPTOSS][DP/EP][Marlin] Enable GPTOSS DP/EP using Marlin kernels ( #25488 )
...
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
Co-authored-by: mgoin <mgoin64@gmail.com >
2025-10-03 20:13:13 -04:00
Varun Sundar Rabindranath
8c9117181d
[Misc] Remove typing.List ( #26150 )
...
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
2025-10-03 07:00:33 +00:00
XuruiYang
845adb3ec6
[Model] Add LongCat-Flash ( #23991 )
...
Signed-off-by: yangxurui <yangxurui@meituan.com >
Co-authored-by: yangxurui <yangxurui@meituan.com >
2025-09-24 21:53:40 -07:00
Michael Goin
7361ab379f
Remove redundant mutates_args and dispatch_key for direct_register_custom_op ( #25512 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-09-23 22:48:40 +00:00
Bowen Wang
06a41334c7
[EPLB] Reduce EPLB Inference Overhead ( #24573 )
...
Signed-off-by: Bowen Wang <abmfy@icloud.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com >
2025-09-22 16:31:05 +00:00
bnellnm
4bdf400218
[Bugfix] Fix chunked a2_scales in modular kernels ( #25264 )
...
Signed-off-by: Bill Nell <bnell@redhat.com >
2025-09-19 19:42:01 +00:00
bnellnm
5963b98b46
[Kernel] Delegate construction of FusedMoEQuantConfig to FusedMoEMethodBase subclasses ( #22537 )
...
Signed-off-by: Bill Nell <bnell@redhat.com >
2025-09-17 17:43:31 -06:00
Tahsin Tunan
cef32104b4
[FP8] Extend per-token-group quantization support to QuantFP8 ( #24342 )
...
Signed-off-by: Tahsin Tunan <tahsintunan@gmail.com >
Signed-off-by: Luka Govedič <lgovedic@redhat.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Luka Govedič <lgovedic@redhat.com >
2025-09-16 18:31:06 -07:00
Jee Jee Li
04ad0dc275
[benchmark] Add triton version in the moe tuned config ( #24769 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-09-16 14:10:54 +08:00
Didier Durand
35bf193864
[Doc]: fix typos in Python comments ( #24294 )
...
Signed-off-by: Didier Durand <durand.didier@gmail.com >
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
2025-09-05 19:41:12 -07:00
Xin Yang
8fb85b7bb6
Add routed_scaling_factor to MoE grouped topk ( #23123 )
...
Signed-off-by: Xin Yang <xyangx@amazon.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2025-08-29 21:36:48 -07:00
Wentao Ye
3af47c3cc6
[Feature] Add Hopper DeepGEMM E8M0 for DeepSeekV3.1 scale_fmt ( #23666 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
Signed-off-by: youkaichao <youkaichao@gmail.com >
Co-authored-by: youkaichao <youkaichao@gmail.com >
2025-08-27 14:09:08 +00:00
Xin Yang
8a3cd90af5
[Kernel] Add fused grouped_topk kernel for MoE ( #23274 )
...
Signed-off-by: Xin Yang <xyangx@amazon.com >
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
2025-08-25 11:47:52 -07:00
Jee Jee Li
4d4061b6e7
[Kernel] Add cuda kernel for gpt_oss activation ( #22951 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-08-17 05:03:24 +00:00
Michael Goin
94096a47c9
[UX] Separate marlin moe config logic from triton moe ( #23006 )
2025-08-16 22:16:42 -04:00
bnellnm
8ad7285ea2
[Kernels] Clean up FusedMoeMethodBase and modular kernel setup. Remove extra arguments from modular kernel methods. ( #22035 )
...
Signed-off-by: Bill Nell <bnell@redhat.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
2025-08-15 14:46:00 -04:00
amirkl94
b4cef5e6c7
refactor: Change scaling factors calculation for flashinfer FusedMoE ( #22812 )
...
Signed-off-by: Amir Klein <203507526+amirkl94@users.noreply.github.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
2025-08-15 06:19:31 +00:00
Simon Mo
f1f0d2fab8
Revert "[Kernel] Add cuda kernel for gpt_oss activation" ( #22948 )
2025-08-14 17:38:10 -07:00
Jee Jee Li
81f4b96481
[Kernel] Add cuda kernel for gpt_oss activation ( #22538 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-08-14 17:21:29 -07:00
Chi Zhang
98deac3879
[FEATURE] support custom vllm tuned config path for fused moe triton kernels ( #22791 )
...
Signed-off-by: Chi Zhang <zhangchi.usc1992@bytedance.com >
2025-08-13 20:27:25 +08:00
Wentao Ye
f7dcce7a4a
[Feature] Add VLLM_USE_DEEP_GEMM_E8M0 Env to Control E8M0 Scale ( #21968 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-08-11 09:39:08 -07:00
Isotr0py
7e8d685775
[Minor] Fix pre-commit error on main ( #22579 )
...
Signed-off-by: Isotr0py <2037008807@qq.com >
2025-08-10 00:08:23 -07:00
Jee Jee Li
0c5254b82a
[oss] Init gpt-oss bf16 support ( #22508 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-08-09 20:19:13 -07:00
Varun Sundar Rabindranath
f703b923f3
[Misc] DeepGEMM : Avoid JIT generation in the hot-path ( #22215 )
...
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
2025-08-08 16:09:59 -07:00
Wentao Ye
d7b28f3415
[Log] DeepGEMM Update Log for Unaligned Problem Size ( #22208 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-08-04 19:13:19 -07:00
JartX
3654847db5
feat: Add Support GPTQ Quantization MOE on ROCM vllm serve ( #21733 )
2025-08-01 21:12:19 -04:00
amirkl94
207b750e19
[NVIDIA] Add SM100 Flashinfer MoE per tensor scale fp8 backend ( #21458 )
...
Signed-off-by: Amir Klein <203507526+amirkl94@users.noreply.github.com >
Signed-off-by: mgoin <mgoin64@gmail.com >
Co-authored-by: mgoin <mgoin64@gmail.com >
2025-07-31 06:00:01 -07:00
Isotr0py
a4528f0cac
[Model]: Fused MoE for nomic-embed-text-v2-moe ( #18321 )
...
Signed-off-by: isotr0py <2037008807@qq.com >
Signed-off-by: Isotr0py <2037008807@qq.com >
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-07-29 03:13:27 -07:00