Robert Shaw
|
af8fd73051
|
[MoE Refactor][14/N] Clean Up FI Quant Config Smuggling (#31593)
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
|
2026-01-06 15:47:04 +00:00 |
|
wangxiyuan
|
bb4337b34c
|
[Platform] Deprecate seed_everything (#31659)
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
|
2026-01-04 18:34:04 -08:00 |
|
Xinyu Chen
|
08f425bad1
|
CustomOp: test forward dispatch for grouped_topk (#31530)
Signed-off-by: Xinyu Chen <xinyu1.chen@intel.com>
|
2026-01-02 10:04:01 -05:00 |
|
Andreas Karatzas
|
45c1ca1ca1
|
[ROCm][CI] Skip DeepGemm-dependent test on ROCm platform (#31462)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2025-12-29 16:31:10 +09:00 |
|
Yongye Zhu
|
7b926e8901
|
[MoE Refactor][9/N] Use modular kernel for unquantized Triton MoE (#31052)
Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>
|
2025-12-22 17:34:19 +00:00 |
|
Kevin McKay
|
ec58c10ce1
|
[Misc] Fix quantization-related typos (#31116)
Signed-off-by: c0de128 <kevin.mckay@outlook.com>
|
2025-12-21 21:13:48 -08:00 |
|
Robert Shaw
|
83a317f650
|
[MoE Refactor][3/N] Deprecate cutlass block quant fp8 (b200) (#30990)
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
|
2025-12-19 13:09:54 -08:00 |
|
Li, Jiang
|
e3ab93c896
|
[CPU] Refactor CPU fused MOE (#30531)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
|
2025-12-18 14:36:49 +08:00 |
|
Xinyu Chen
|
3b1d440ede
|
CustomOp: grouped topk (#29575)
Signed-off-by: Xinyu Chen <xinyu1.chen@intel.com>
|
2025-12-17 17:43:00 +08:00 |
|
Wentao Ye
|
3778673ea8
|
[Feat] Refactor for parallel_config in FusedMoEModularKernel (#30282)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
|
2025-12-15 04:21:36 +00:00 |
|
Roberto L. Castro
|
4fa7ce46f3
|
[Feature] Add SM103 (Blackwell Ultra) Support to vLLM (#30484)
Signed-off-by: LopezCastroRoberto <robertol.c510@gmail.com>
Signed-off-by: Roberto L. Castro <38211239+LopezCastroRoberto@users.noreply.github.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
|
2025-12-12 19:34:23 -08:00 |
|
Lucas Wilkinson
|
3e41992fec
|
[Attention] Use sparse prefill kernel for fp8 kv-cache in DeepSeek-v3.2 (#27532)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2025-12-12 05:57:47 -08:00 |
|
Cyrus Leung
|
5a87d8b9b1
|
[Deprecation] Remove deprecated plugin and compilation fields for v0.13 release (#30396)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-12-10 19:59:35 -08:00 |
|
Jinzhen Lin
|
879ddb09c3
|
[Kernel][MoE] optimize moe_align_block_size (#29642)
Signed-off-by: Jinzhen Lin <jinzhen.ljz@antgroup.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-12-07 01:58:47 -08:00 |
|
bnellnm
|
2902c34826
|
[Kernels] Remove BatchedTritonOrDeepGemmExperts and default fallback to Triton (#29929)
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: bnellnm <49004751+bnellnm@users.noreply.github.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
|
2025-12-03 20:49:00 +00:00 |
|
Varun Sundar Rabindranath
|
19bee6d12d
|
[Performance][DP/EP] Add silu_mul_per_token_group_quant_fp8_colmajor kernel (#29470)
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
|
2025-12-03 18:04:59 +00:00 |
|
Omer Ullman Argov
|
39d28108f4
|
[Feat] Support non-gated activations in NVFP4 modelopt path (#29004)
|
2025-11-30 11:02:40 -05:00 |
|
Xin Yang
|
a491b0911b
|
[LoRA] Support FusedMoE LoRA Triton kernel for mxfp4 (#29708)
Signed-off-by: Xin Yang <xyangx@amazon.com>
Signed-off-by: Xin Yang <105740670+xyang16@users.noreply.github.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-11-30 10:37:25 +08:00 |
|
Jinzhen Lin
|
1656ad3704
|
[Kernel][Quantization] add w4a8 support for marlin kernel (#24722)
Signed-off-by: Jinzhen Lin <jinzhen.ljz@antgroup.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Michael Goin <mgoin@redhat.com>
|
2025-11-29 07:19:33 -08:00 |
|
Huamin Li
|
3fd1fb0b60
|
Revert "[LoRA] Support FusedMoE LoRA Triton kernel for mxfp4 (#28971)" (#29697)
Signed-off-by: Huamin Li <3ericli@gmail.com>
|
2025-11-28 15:26:52 -08:00 |
|
Xin Yang
|
745a3bae1a
|
[LoRA] Support FusedMoE LoRA Triton kernel for mxfp4 (#28971)
Signed-off-by: Xin Yang <xyangx@amazon.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-11-28 10:48:28 +08:00 |
|
bnellnm
|
8f066146c3
|
[MoE][Refactor] Make select_experts a non-static method (#29067)
Signed-off-by: Bill Nell <bnell@redhat.com>
|
2025-11-24 13:38:04 -05:00 |
|
rasmith
|
fd65015a14
|
[CI/Build] Only use supported types and features on ROCm in MoE kernel tests (#29149)
Signed-off-by: Randall Smith <ransmith@amd.com>
Co-authored-by: Randall Smith <ransmith@amd.com>
|
2025-11-21 20:34:33 -07:00 |
|
rasmith
|
322cb02872
|
[CI/Build][AMD] Fix import errors in tests/kernels/attention (#29032)
Signed-off-by: Randall Smith <ransmith@amd.com>
Co-authored-by: Randall Smith <ransmith@amd.com>
|
2025-11-20 17:48:09 +08:00 |
|
Shu Wang
|
613abb50d5
|
[MoE] Nvfp4 Masked Gemm: Add flashinfer grouped_gemm_nt_masked (#25990)
Signed-off-by: Shu Wang. <shuw@nvidia.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
|
2025-11-19 13:29:06 -08:00 |
|
Qiu
|
2fd893b4ce
|
[Feature] Prefill Context Parallel (PCP) basic support (#28718)
Signed-off-by: QiuChunshuo <qiuchunshuo@huawei.com>
Signed-off-by: FENP <yuanyongjie.yyj@antgroup.com>
Signed-off-by: LookAround <lixushi@huawei.com>
Signed-off-by: Jingchun Gao <gaojingchun1@huawei.com>
Signed-off-by: zhenwenqi2024 <zhenwenqi_2022@qq.com>
Co-authored-by: FENP <yuanyongjie.yyj@antgroup.com>
Co-authored-by: LookAround <lixushi@huawei.com>
Co-authored-by: Jingchun Gao <gaojingchun1@huawei.com>
Co-authored-by: zhenwenqi2024 <zhenwenqi_2022@qq.com>
Co-authored-by: Jingchun Gao <63247409+gjc0824@users.noreply.github.com>
|
2025-11-19 15:52:44 -05:00 |
|
Harry Mellor
|
a8b70304d6
|
Update rope_scaling to rope_parameters in preparation for Transformers v5 (#28542)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-11-19 09:06:36 -08:00 |
|
amirkl94
|
03ee48111d
|
Feature: Support Relu2 in FusedMoE fp8 cutlass path (#27261)
|
2025-11-16 13:39:44 -05:00 |
|
Cyrus Leung
|
638e4196d1
|
[Misc] Make SchedulerConfig.max_model_len init-only (#28733)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-11-15 01:59:31 -08:00 |
|
Varun Sundar Rabindranath
|
6965ef436f
|
[Performance][DeepGEMM] Estimate expected_m (#28694)
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
|
2025-11-15 13:52:14 +08:00 |
|
Varun Sundar Rabindranath
|
fe1cd7704d
|
[Performance][B200] silu_mul_quant: pack scales in int32 (#28358)
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
|
2025-11-13 10:16:55 -08:00 |
|
bnellnm
|
a1448b4b69
|
[Kernels] Split up fused_moe/layer.py, isolate more modular kernel code (#28064)
|
2025-11-11 07:29:02 -07:00 |
|
Varun Sundar Rabindranath
|
b039bfda8f
|
[Bugfix] Fix persistent_masked_m_silu_mul_quant tests (#28366)
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
|
2025-11-10 09:21:52 -08:00 |
|
vllmellm
|
f080a83511
|
[RFC][ROCm][AITER] Keep all AITER kernels in _aiter_ops class like _custom_ops and _ipex_ops (#24490)
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
|
2025-11-10 08:20:53 -08:00 |
|
amirkl94
|
6b7a81185d
|
Bugfix: Cutlass FP8 FusedMoE bad scaling factors (#27255)
Signed-off-by: Amir Klein <203507526+amirkl94@users.noreply.github.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
|
2025-11-05 06:06:06 -05:00 |
|
Fardin Hoque
|
b8c48c5d72
|
kernels/moe test pruning (#27053)
Signed-off-by: Fardin Hoque <kfhfar@amazon.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
|
2025-10-30 12:10:34 +08:00 |
|
bnellnm
|
1891cf605a
|
[Bugfix] Fix modular kernel tests (#27707)
Signed-off-by: Bill Nell <bnell@redhat.com>
|
2025-10-29 16:14:33 +08:00 |
|
Yeshwanth N
|
71b1c8b667
|
[Chore]:Extract math and argparse utilities to separate modules (#27188)
Signed-off-by: Yeshwanth Surya <yeshsurya@gmail.com>
Signed-off-by: Yeshwanth N <yeshsurya@gmail.com>
Signed-off-by: yeshsurya <yeshsurya@gmail.com>
|
2025-10-26 04:03:32 -07:00 |
|
Varun Sundar Rabindranath
|
a9f55dc588
|
[Misc] Add triton_kernels dependency (#27370)
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
|
2025-10-23 12:04:14 -07:00 |
|
dongbo910220
|
3ae082c373
|
[Chore] Separate out optional dependency checks from vllm.utils (#27207)
Signed-off-by: dongbo910220 <1275604947@qq.com>
Signed-off-by: dongbo910220 <32610838+dongbo910220@users.noreply.github.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2025-10-22 10:44:21 -04:00 |
|
iAmir97
|
7a6c8c3fa1
|
[Chore] Separate out vllm.utils.network_utils (#27164)
Signed-off-by: iAmir97 <Amir.balwel@embeddedllm.com>
Co-authored-by: iAmir97 <Amir.balwel@embeddedllm.com>
|
2025-10-19 03:06:32 -07:00 |
|
Isotr0py
|
6ac5e06f7c
|
[Chore] Clean up pytorch helper functions in vllm.utils (#26908)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: isotr0py <2037008807@qq.com>
|
2025-10-18 09:48:22 -07:00 |
|
jiahanc
|
41d3071918
|
[NVIDIA] [Perf] Update to leverage flashinfer trtllm FP4 MOE throughput kernel (#26714)
Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
|
2025-10-16 16:20:25 -07:00 |
|
Wentao Ye
|
b3dda72c23
|
[Feature] Migrate DeepGEMM API from get_m_alignment_for_contiguous_layout to get_mk_alignment_for_contiguous_layout (#26935)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-10-16 16:46:48 -04:00 |
|
Varun Sundar Rabindranath
|
fb0571b077
|
[GPTOSS][DP/EP][Marlin] Enable GPTOSS Batched DP/EP using Marlin kernels (#25997)
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
|
2025-10-16 12:53:11 -07:00 |
|
kliuae
|
1317034379
|
[ROCm][FEAT] Fuse DeepSeek shared experts into AITER fused_moe ops (#24097)
Signed-off-by: chenjun <junchen2@amd.com>
Signed-off-by: kliuae <kuanfu.liu@embeddedllm.com>
Co-authored-by: valarLip <103567126+valarLip@users.noreply.github.com>
Co-authored-by: TJian <tunjian.tan@embeddedllm.com>
|
2025-10-16 10:41:34 +08:00 |
|
Varun Sundar Rabindranath
|
8ae169286f
|
[torch.compile] Unwrap fused_marlin_moe custom op (#26739)
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
|
2025-10-14 02:22:16 +00:00 |
|
Harry Mellor
|
8fcaaf6a16
|
Update Optional[x] -> x | None and Union[x, y] to x | y (#26633)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-10-12 09:51:31 -07:00 |
|
Elvir Crnčević
|
7b03584de8
|
Silu v2 (#25074)
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: elvircrn <elvircrn@gmail.com>
Signed-off-by: Elvir Crnčević <elvircrn@gmail.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
Co-authored-by: Varun Sundar Rabindranath <varunsundar08@gmail.com>
|
2025-10-10 15:19:53 +00:00 |
|
bnellnm
|
da364615fc
|
[Kernels] Modular kernel refactor (#24812)
Signed-off-by: Bill Nell <bnell@redhat.com>
|
2025-10-08 17:51:52 -04:00 |
|