vllmellm
|
f080a83511
|
[RFC][ROCm][AITER] Keep all AITER kernels in _aiter_ops class like _custom_ops and _ipex_ops (#24490)
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
|
2025-11-10 08:20:53 -08:00 |
|
Aleksandr Malyshev
|
449de9001a
|
[ROCm] triton fp8 kernel (#27058)
Signed-off-by: Aleksandr Malyshev <maleksan@amd.com>
Co-authored-by: Aleksandr Malyshev <maleksan@amd.com>
Co-authored-by: Gregory Shtrasberg <156009573+gshtras@users.noreply.github.com>
|
2025-11-06 14:46:44 -05:00 |
|
Isotr0py
|
6ac5e06f7c
|
[Chore] Clean up pytorch helper functions in vllm.utils (#26908)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: isotr0py <2037008807@qq.com>
|
2025-10-18 09:48:22 -07:00 |
|
Harry Mellor
|
6c9fdbf725
|
[Docs] Replace rst style double-backtick with md single-backtick (#27091)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-10-17 02:47:34 -07:00 |
|
Harry Mellor
|
8fcaaf6a16
|
Update Optional[x] -> x | None and Union[x, y] to x | y (#26633)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-10-12 09:51:31 -07:00 |
|
Lucas Kabela
|
213b64452a
|
[Bugfix] Convert untraceable GroupShape to list for AMD impl (#26535)
Signed-off-by: Lucas Kabela <lucaskabela@meta.com>
|
2025-10-10 13:32:29 +00:00 |
|
Harry Mellor
|
d6953beb91
|
Convert formatting to use ruff instead of yapf + isort (#26247)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-10-05 07:06:22 -07:00 |
|
ElizaWszola
|
502640c3f9
|
[Perf] Fix and reapply move apply w8a8 block fp8 linear to class (#25696)
Signed-off-by: ElizaWszola <ewszola@redhat.com>
Signed-off-by: ElizaWszola <elizaw.9289@gmail.com>
Signed-off-by: Luka Govedič <lgovedic@redhat.com>
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Luka Govedič <lgovedic@redhat.com>
|
2025-10-02 19:35:13 +00:00 |
|
Wentao Ye
|
89e4050af4
|
[Bug] Fix Weight Loading for Block FP8 Cutlass SM90 (#25909)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-09-30 09:15:19 +08:00 |
|
Wentao Ye
|
9fe4c2bdb9
|
[Refactor] Remove DeepGEMM OP Register (#25710)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-09-25 20:13:41 -04:00 |
|
Tyler Michael Smith
|
1260180c67
|
Revert "[Performance] Move apply_w8a8_block_fp8_linear to an op class… (#25607)
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
|
2025-09-25 08:05:21 +00:00 |
|
Wentao Ye
|
1f29141258
|
[Refactor] Use DeepGEMM Col Major TMA Aligned Tensor (#25517)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-09-24 18:52:36 -04:00 |
|
Li, Jiang
|
1cbcfb94de
|
[Bugfix][CPU] Skip unsupported custom op register on CPU (#25534)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
|
2025-09-24 06:21:51 +00:00 |
|
Michael Goin
|
7361ab379f
|
Remove redundant mutates_args and dispatch_key for direct_register_custom_op (#25512)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-09-23 22:48:40 +00:00 |
|
ElizaWszola
|
63400259d0
|
[Performance] Move apply_w8a8_block_fp8_linear to an op class (#24666)
Signed-off-by: ElizaWszola <ewszola@redhat.com>
Signed-off-by: ElizaWszola <elizaw.9289@gmail.com>
Signed-off-by: Luka Govedič <lgovedic@redhat.com>
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Luka Govedič <lgovedic@redhat.com>
|
2025-09-23 12:03:10 -07:00 |
|
bnellnm
|
f11e3c516b
|
[Kernels] Support blocked fp8 quantization for compressed tensors MoE (#25219)
Signed-off-by: Bill Nell <bnell@redhat.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
|
2025-09-23 16:11:34 +00:00 |
|
Michael Goin
|
fbd6523ac0
|
Refactor dense FP8 tensor/channel/block utils and add CT FP8 block (#21404)
|
2025-09-18 08:53:45 -04:00 |
|
bnellnm
|
5963b98b46
|
[Kernel] Delegate construction of FusedMoEQuantConfig to FusedMoEMethodBase subclasses (#22537)
Signed-off-by: Bill Nell <bnell@redhat.com>
|
2025-09-17 17:43:31 -06:00 |
|
Michael Goin
|
c3aea10dc8
|
[Perf] Use upstream CUTLASS for SM90 Block FP8 kernel (#23280)
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
|
2025-09-11 15:43:14 -07:00 |
|
Wentao Ye
|
3af47c3cc6
|
[Feature] Add Hopper DeepGEMM E8M0 for DeepSeekV3.1 scale_fmt (#23666)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
|
2025-08-27 14:09:08 +00:00 |
|
Wentao Ye
|
394591e343
|
[Feature] Enable DeepGEMM Linear on B200; 1.5% E2E throughput improvement (#23351)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-08-21 21:01:08 -07:00 |
|
Wentao Ye
|
f7dcce7a4a
|
[Feature] Add VLLM_USE_DEEP_GEMM_E8M0 Env to Control E8M0 Scale (#21968)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-08-11 09:39:08 -07:00 |
|
Wentao Ye
|
eec890c1c1
|
[Bug] Fix B200 DeepGEMM E8M0 Accuracy Issue (#22399)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-08-06 17:03:53 -07:00 |
|
TJian
|
e626d286f5
|
[FEAT] [ROCm] [AITER]: Add AITER HIP block quant kernel (#21242)
|
2025-07-28 05:07:06 +00:00 |
|
Wentao Ye
|
633f6e804b
|
[Bug] Fix DeepGemm Init Error (#21554)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-07-24 20:07:22 -07:00 |
|
Wentao Ye
|
774d0c014b
|
[Perf] Cuda Kernel for Per Token Group Quant (#21083)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-07-22 07:27:15 -07:00 |
|
Wentao Ye
|
76ddeff293
|
[Doc] Remove duplicate docstring (#21012)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-07-15 20:09:13 -07:00 |
|
TJian
|
80d38b8ac8
|
[V1] [ROCm] [AITER] Upgrade AITER to commit 916bf3c and bugfix APIs (#20880)
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
|
2025-07-13 15:19:32 +00:00 |
|
Wentao Ye
|
42d440c22b
|
[Perf] Use Triton instead of Torch for DeepGEMM Per Token Group Quant (#20841)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-07-12 19:38:45 -07:00 |
|
Wentao Ye
|
e2de455c34
|
[Feature] Integrate SM100 DeepGEMM support (#20087)
|
2025-07-10 20:18:05 -07:00 |
|
Li, Jiang
|
0ec3779df7
|
[Bugfix][CI/CD][CPU] Fix CPU CI tests (#20383)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
|
2025-07-02 20:11:36 -07:00 |
|
Wentao Ye
|
4d36693687
|
[Refactor] Create a function util and cache the results for has_deepgemm, has_deepep, has_pplx (#20187)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-06-28 22:06:38 +00:00 |
|
Wentao Ye
|
879f69bed3
|
[Refactor] Remove duplicate ceil_div (#20023)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-06-25 05:19:09 +00:00 |
|
Wentao Ye
|
a6c4b87fbc
|
Revert "[Feature] Integrate new deepgemm (#19820)" (#20049)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-06-24 19:45:22 -07:00 |
|
Wentao Ye
|
c6e3bba8e6
|
[Feature] Integrate new deepgemm (#19820)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-06-24 12:51:56 -07:00 |
|
Varun Sundar Rabindranath
|
e5d35d62f5
|
[BugFix] Force registration of w8a8_block_fp8_matmul_deepgemm via lazy import (#19514)
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
|
2025-06-12 04:28:12 +00:00 |
|
artetaout
|
b8e809a057
|
[Kernel] Support deep_gemm for linear methods (#19085)
Signed-off-by: artetaout <lulala341@gmail.com>
|
2025-06-11 15:14:45 +08:00 |
|
Varun Sundar Rabindranath
|
5cf2daea9a
|
[Misc] Fixes and Optimizations for DeepEP + DeepGEMM combination. (#19298)
Signed-off-by: Varun <vsundarr@redhat.com>
Co-authored-by: Varun <vsundarr@redhat.com>
|
2025-06-09 10:50:39 -04:00 |
|
Lain
|
5f2cd251d2
|
Sm100 blockwise fp8 swap ab (#18564)
|
2025-06-04 07:48:45 -07:00 |
|
Simon Mo
|
02f0c7b220
|
[Misc] Add SPDX-FileCopyrightText (#19100)
Signed-off-by: simon-mo <simon.mo@hey.com>
|
2025-06-03 11:20:17 -07:00 |
|
Lain
|
e23564cb70
|
use ceil_div in cutlass block scaling shape check (#17918)
|
2025-05-16 03:02:58 -07:00 |
|
vllmellm
|
40de1ef455
|
[FEAT] [ROCm]: Add AITER Block-Scaled GEMM Feature (#14968)
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com>
|
2025-05-13 19:08:20 -07:00 |
|
Harry Mellor
|
6223dd8114
|
Update deprecated type hinting in model_executor/layers (#18056)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-05-13 04:17:23 -07:00 |
|
Shu Wang
|
376786fac1
|
Add cutlass support for blackwell fp8 blockwise gemm (#14383)
Signed-off-by: Shu Wang <shuw@nvidia.com>
|
2025-05-08 15:09:55 -07:00 |
|
Mengqing Cao
|
f9bc5a0693
|
[Bugfix] Fix triton import with local TritonPlaceholder (#17446)
Signed-off-by: Mengqing Cao <cmq0113@163.com>
|
2025-05-06 17:53:09 +08:00 |
|
Lucas Wilkinson
|
9532c49836
|
[Attention] MLA get rid of materialization (#14770)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2025-03-13 23:39:02 -07:00 |
|
Jeff Daily
|
a1c8f3796c
|
dynamic distpatch of fp8 kernels (#14245)
Signed-off-by: Jeff Daily <jeff.daily@amd.com>
|
2025-03-11 10:54:56 -04:00 |
|
Luka Govedič
|
e1744502c2
|
[FP8] Refactor apply_fp8_linear and apply_fp8_linear_generic into an object (#14390)
Signed-off-by: luka <luka@neuralmagic.com>
|
2025-03-07 05:20:16 +00:00 |
|
Woosuk Kwon
|
b382a7f28f
|
[BugFix] Make FP8 Linear compatible with torch.compile (#13918)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-02-26 13:48:55 -08:00 |
|
Gregory Shtrasberg
|
c904fdddf6
|
[ROCm] Apply FP8 weights padding to values not divisible by 512 bytes on ROCm (#13231)
|
2025-02-22 05:54:38 -08:00 |
|