biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
vllmellm	f080a83511	[RFC][ROCm][AITER] Keep all AITER kernels in `_aiter_ops` class like `_custom_ops` and `_ipex_ops` (#24490 ) Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2025-11-10 08:20:53 -08:00
Aleksandr Malyshev	449de9001a	[ROCm] triton fp8 kernel (#27058 ) Signed-off-by: Aleksandr Malyshev <maleksan@amd.com> Co-authored-by: Aleksandr Malyshev <maleksan@amd.com> Co-authored-by: Gregory Shtrasberg <156009573+gshtras@users.noreply.github.com>	2025-11-06 14:46:44 -05:00
Isotr0py	6ac5e06f7c	[Chore] Clean up pytorch helper functions in `vllm.utils` (#26908 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Signed-off-by: isotr0py <2037008807@qq.com>	2025-10-18 09:48:22 -07:00
Harry Mellor	6c9fdbf725	[Docs] Replace `rst` style double-backtick with `md` single-backtick (#27091 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-10-17 02:47:34 -07:00
Harry Mellor	8fcaaf6a16	Update `Optional[x]` -> `x \| None` and `Union[x, y]` to `x \| y` (#26633 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-10-12 09:51:31 -07:00
Lucas Kabela	213b64452a	[Bugfix] Convert untraceable GroupShape to list for AMD impl (#26535 ) Signed-off-by: Lucas Kabela <lucaskabela@meta.com>	2025-10-10 13:32:29 +00:00
Harry Mellor	d6953beb91	Convert formatting to use `ruff` instead of `yapf` + `isort` (#26247 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-10-05 07:06:22 -07:00
ElizaWszola	502640c3f9	[Perf] Fix and reapply move apply w8a8 block fp8 linear to class (#25696 ) Signed-off-by: ElizaWszola <ewszola@redhat.com> Signed-off-by: ElizaWszola <elizaw.9289@gmail.com> Signed-off-by: Luka Govedič <lgovedic@redhat.com> Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: Luka Govedič <lgovedic@redhat.com>	2025-10-02 19:35:13 +00:00
Wentao Ye	89e4050af4	[Bug] Fix Weight Loading for Block FP8 Cutlass SM90 (#25909 ) Signed-off-by: yewentao256 <zhyanwentao@126.com> Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-09-30 09:15:19 +08:00
Wentao Ye	9fe4c2bdb9	[Refactor] Remove DeepGEMM OP Register (#25710 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-09-25 20:13:41 -04:00
Tyler Michael Smith	1260180c67	Revert "[Performance] Move apply_w8a8_block_fp8_linear to an op class… (#25607 ) Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>	2025-09-25 08:05:21 +00:00
Wentao Ye	1f29141258	[Refactor] Use DeepGEMM Col Major TMA Aligned Tensor (#25517 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-09-24 18:52:36 -04:00
Li, Jiang	1cbcfb94de	[Bugfix][CPU] Skip unsupported custom op register on CPU (#25534 ) Signed-off-by: jiang1.li <jiang1.li@intel.com>	2025-09-24 06:21:51 +00:00
Michael Goin	7361ab379f	Remove redundant mutates_args and dispatch_key for direct_register_custom_op (#25512 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-09-23 22:48:40 +00:00
ElizaWszola	63400259d0	[Performance] Move apply_w8a8_block_fp8_linear to an op class (#24666 ) Signed-off-by: ElizaWszola <ewszola@redhat.com> Signed-off-by: ElizaWszola <elizaw.9289@gmail.com> Signed-off-by: Luka Govedič <lgovedic@redhat.com> Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: Luka Govedič <lgovedic@redhat.com>	2025-09-23 12:03:10 -07:00
bnellnm	f11e3c516b	[Kernels] Support blocked fp8 quantization for compressed tensors MoE (#25219 ) Signed-off-by: Bill Nell <bnell@redhat.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>	2025-09-23 16:11:34 +00:00
Michael Goin	fbd6523ac0	Refactor dense FP8 tensor/channel/block utils and add CT FP8 block (#21404 )	2025-09-18 08:53:45 -04:00
bnellnm	5963b98b46	[Kernel] Delegate construction of FusedMoEQuantConfig to FusedMoEMethodBase subclasses (#22537 ) Signed-off-by: Bill Nell <bnell@redhat.com>	2025-09-17 17:43:31 -06:00
Michael Goin	c3aea10dc8	[Perf] Use upstream CUTLASS for SM90 Block FP8 kernel (#23280 ) Signed-off-by: mgoin <mgoin64@gmail.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>	2025-09-11 15:43:14 -07:00
Wentao Ye	3af47c3cc6	[Feature] Add Hopper DeepGEMM E8M0 for DeepSeekV3.1 scale_fmt (#23666 ) Signed-off-by: yewentao256 <zhyanwentao@126.com> Signed-off-by: youkaichao <youkaichao@gmail.com> Co-authored-by: youkaichao <youkaichao@gmail.com>	2025-08-27 14:09:08 +00:00
Wentao Ye	394591e343	[Feature] Enable DeepGEMM Linear on B200; 1.5% E2E throughput improvement (#23351 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-08-21 21:01:08 -07:00
Wentao Ye	f7dcce7a4a	[Feature] Add `VLLM_USE_DEEP_GEMM_E8M0` Env to Control E8M0 Scale (#21968 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-08-11 09:39:08 -07:00
Wentao Ye	eec890c1c1	[Bug] Fix B200 DeepGEMM E8M0 Accuracy Issue (#22399 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-08-06 17:03:53 -07:00
TJian	e626d286f5	[FEAT] [ROCm] [AITER]: Add AITER HIP block quant kernel (#21242 )	2025-07-28 05:07:06 +00:00
Wentao Ye	633f6e804b	[Bug] Fix DeepGemm Init Error (#21554 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-07-24 20:07:22 -07:00
Wentao Ye	774d0c014b	[Perf] Cuda Kernel for Per Token Group Quant (#21083 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-07-22 07:27:15 -07:00
Wentao Ye	76ddeff293	[Doc] Remove duplicate docstring (#21012 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-07-15 20:09:13 -07:00
TJian	80d38b8ac8	[V1] [ROCm] [AITER] Upgrade AITER to commit `916bf3c` and bugfix APIs (#20880 ) Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>	2025-07-13 15:19:32 +00:00
Wentao Ye	42d440c22b	[Perf] Use Triton instead of Torch for DeepGEMM Per Token Group Quant (#20841 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-07-12 19:38:45 -07:00
Wentao Ye	e2de455c34	[Feature] Integrate SM100 DeepGEMM support (#20087 )	2025-07-10 20:18:05 -07:00
Li, Jiang	0ec3779df7	[Bugfix][CI/CD][CPU] Fix CPU CI tests (#20383 ) Signed-off-by: jiang1.li <jiang1.li@intel.com>	2025-07-02 20:11:36 -07:00
Wentao Ye	4d36693687	[Refactor] Create a function util and cache the results for `has_deepgemm`, `has_deepep`, `has_pplx` (#20187 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-06-28 22:06:38 +00:00
Wentao Ye	879f69bed3	[Refactor] Remove duplicate `ceil_div` (#20023 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-06-25 05:19:09 +00:00
Wentao Ye	a6c4b87fbc	Revert "[Feature] Integrate new deepgemm (#19820 )" (#20049 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-06-24 19:45:22 -07:00
Wentao Ye	c6e3bba8e6	[Feature] Integrate new deepgemm (#19820 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-06-24 12:51:56 -07:00
Varun Sundar Rabindranath	e5d35d62f5	[BugFix] Force registration of w8a8_block_fp8_matmul_deepgemm via lazy import (#19514 ) Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com> Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>	2025-06-12 04:28:12 +00:00
artetaout	b8e809a057	[Kernel] Support deep_gemm for linear methods (#19085 ) Signed-off-by: artetaout <lulala341@gmail.com>	2025-06-11 15:14:45 +08:00
Varun Sundar Rabindranath	5cf2daea9a	[Misc] Fixes and Optimizations for DeepEP + DeepGEMM combination. (#19298 ) Signed-off-by: Varun <vsundarr@redhat.com> Co-authored-by: Varun <vsundarr@redhat.com>	2025-06-09 10:50:39 -04:00
Lain	5f2cd251d2	Sm100 blockwise fp8 swap ab (#18564 )	2025-06-04 07:48:45 -07:00
Simon Mo	02f0c7b220	[Misc] Add SPDX-FileCopyrightText (#19100 ) Signed-off-by: simon-mo <simon.mo@hey.com>	2025-06-03 11:20:17 -07:00
Lain	e23564cb70	use ceil_div in cutlass block scaling shape check (#17918 )	2025-05-16 03:02:58 -07:00
vllmellm	40de1ef455	[FEAT] [ROCm]: Add AITER Block-Scaled GEMM Feature (#14968 ) Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com> Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com> Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com>	2025-05-13 19:08:20 -07:00
Harry Mellor	6223dd8114	Update deprecated type hinting in `model_executor/layers` (#18056 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-05-13 04:17:23 -07:00
Shu Wang	376786fac1	Add cutlass support for blackwell fp8 blockwise gemm (#14383 ) Signed-off-by: Shu Wang <shuw@nvidia.com>	2025-05-08 15:09:55 -07:00
Mengqing Cao	f9bc5a0693	[Bugfix] Fix triton import with local TritonPlaceholder (#17446 ) Signed-off-by: Mengqing Cao <cmq0113@163.com>	2025-05-06 17:53:09 +08:00
Lucas Wilkinson	9532c49836	[Attention] MLA get rid of materialization (#14770 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2025-03-13 23:39:02 -07:00
Jeff Daily	a1c8f3796c	dynamic distpatch of fp8 kernels (#14245 ) Signed-off-by: Jeff Daily <jeff.daily@amd.com>	2025-03-11 10:54:56 -04:00
Luka Govedič	e1744502c2	[FP8] Refactor apply_fp8_linear and apply_fp8_linear_generic into an object (#14390 ) Signed-off-by: luka <luka@neuralmagic.com>	2025-03-07 05:20:16 +00:00
Woosuk Kwon	b382a7f28f	[BugFix] Make FP8 Linear compatible with torch.compile (#13918 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-02-26 13:48:55 -08:00
Gregory Shtrasberg	c904fdddf6	[ROCm] Apply FP8 weights padding to values not divisible by 512 bytes on ROCm (#13231 )	2025-02-22 05:54:38 -08:00

1 2

61 Commits