biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Zhonghua Deng	969bbc7c61	[Model] Add MiMo-V2-Flash support (#30836 ) Signed-off-by: Abatom <abzhonghua@gmail.com> Signed-off-by: Jumiar <liuanqim10@126.com> Signed-off-by: Zyann7 <zyann7@outlook.com> Co-authored-by: Jumiar <liuanqim10@126.com> Co-authored-by: Zyann7 <zyann7@outlook.com> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>	2025-12-19 17:17:03 +00:00
Wentao Ye	f284d7bd0c	[Bug] Fix AttributeError: 'ColumnParallelLinear' object has no attribute `weight_scale_inv` (#30823 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-12-17 02:00:35 -08:00
Roberto L. Castro	4fa7ce46f3	[Feature] Add SM103 (Blackwell Ultra) Support to vLLM (#30484 ) Signed-off-by: LopezCastroRoberto <robertol.c510@gmail.com> Signed-off-by: Roberto L. Castro <38211239+LopezCastroRoberto@users.noreply.github.com> Co-authored-by: youkaichao <youkaichao@gmail.com>	2025-12-12 19:34:23 -08:00
rasmith	08f8a5627e	[CI/Build][Kernel][BugFix][AMD] Fix per_token_group_quant_fp8 to use correct fp8 min/max values and update atol/rtol in test_quantfp8_group_functionality (#30292 ) Signed-off-by: Randall Smith <ransmith@amd.com> Co-authored-by: Randall Smith <ransmith@amd.com>	2025-12-12 18:41:56 -05:00
ElizaWszola	2e7035dd8c	[Bugfix] Fix fp8 DeepGemm compilation issues (#30336 )	2025-12-09 20:17:25 -05:00
Charlie Fu	3c680f4a17	[Rocm][torch.compile] Adding layernorm + fp8 block quant and silu + fp8 block quant for Aiter (#25693 ) Signed-off-by: charlifu <charlifu@amd.com> Signed-off-by: Micah Williamson <micah.williamson@amd.com> Signed-off-by: Charlie Fu <Charlie.Fu@amd.com> Co-authored-by: Micah Williamson <micah.williamson@amd.com> Co-authored-by: wuhuikx <hattie.wu@amd.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by: Gregory Shtrasberg <156009573+gshtras@users.noreply.github.com>	2025-12-09 22:39:26 +00:00
Kyle Sayers	fccd532587	[Quantization] FP8 Weight Reloading for Quantized RL Rollout (#28480 ) Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>	2025-12-09 13:54:32 -08:00
Zhewen Li	ae339b1a67	[Bugfix] Fix DeepGEMM after #29546 (#30267 ) Signed-off-by: zhewenli <zhewenli@meta.com> Signed-off-by: Zhewen Li <zhewenli@meta.com>	2025-12-09 01:05:27 +00:00
ElizaWszola	af0444bf40	[Performance] Fused blockwise quant RMS norm (#27883 ) Signed-off-by: ElizaWszola <ewszola@redhat.com> Signed-off-by: yewentao256 <zhyanwentao@126.com> Co-authored-by: yewentao256 <zhyanwentao@126.com>	2025-12-07 16:38:04 +00:00
Wentao Ye	541a2ef892	[Perf] Deepgemm fused layout kernel for activations, 4.3% throughput improvement, 10.7% TTFT improvement. (#29546 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-12-07 20:31:14 +08:00
Varun Sundar Rabindranath	19bee6d12d	[Performance][DP/EP] Add silu_mul_per_token_group_quant_fp8_colmajor kernel (#29470 ) Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com> Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com> Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>	2025-12-03 18:04:59 +00:00
TJian	a425dc256e	[Bugfix] [ROCm] [AITER]: Fix aiter block quant not compatible with torch compile dynamo (#28716 ) Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>	2025-11-14 10:30:50 -08:00
vllmellm	d8140b9833	[ROCM] Fix ROCm warnings, environment flag access, and GEMM kernel naming for consistency in `_aiter_ops.py` (#28464 ) Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>	2025-11-12 21:46:57 +00:00
Varun Sundar Rabindranath	74a9a9faad	[Performance][B200] Fix deepgemm prologue (#27897 ) Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com> Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>	2025-11-12 13:13:03 -08:00
Alexander Matveev	f76e85c299	[Performance][Hopper] Avoid M dim padding to 4x for most cases (due to cuda graphs paddings) (#28492 ) Signed-off-by: Alexander Matveev <amatveev@redhat.com>	2025-11-12 10:51:43 -05:00
Michael Goin	f9a4087182	Remove weight_scale.T special case for SM90 Block FP8 CUTLASS kernel (#28431 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-11-11 11:46:04 -05:00
Yong Hoon Shin	021143561f	[ROCm] Add missing gemm_a8w8_blockscale import (#28378 ) Signed-off-by: Yong Hoon Shin <yhshin@meta.com>	2025-11-10 23:13:36 +00:00
vllmellm	f080a83511	[RFC][ROCm][AITER] Keep all AITER kernels in `_aiter_ops` class like `_custom_ops` and `_ipex_ops` (#24490 ) Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2025-11-10 08:20:53 -08:00
Aleksandr Malyshev	449de9001a	[ROCm] triton fp8 kernel (#27058 ) Signed-off-by: Aleksandr Malyshev <maleksan@amd.com> Co-authored-by: Aleksandr Malyshev <maleksan@amd.com> Co-authored-by: Gregory Shtrasberg <156009573+gshtras@users.noreply.github.com>	2025-11-06 14:46:44 -05:00
Isotr0py	6ac5e06f7c	[Chore] Clean up pytorch helper functions in `vllm.utils` (#26908 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Signed-off-by: isotr0py <2037008807@qq.com>	2025-10-18 09:48:22 -07:00
Harry Mellor	6c9fdbf725	[Docs] Replace `rst` style double-backtick with `md` single-backtick (#27091 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-10-17 02:47:34 -07:00
Harry Mellor	8fcaaf6a16	Update `Optional[x]` -> `x \| None` and `Union[x, y]` to `x \| y` (#26633 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-10-12 09:51:31 -07:00
Lucas Kabela	213b64452a	[Bugfix] Convert untraceable GroupShape to list for AMD impl (#26535 ) Signed-off-by: Lucas Kabela <lucaskabela@meta.com>	2025-10-10 13:32:29 +00:00
Harry Mellor	d6953beb91	Convert formatting to use `ruff` instead of `yapf` + `isort` (#26247 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-10-05 07:06:22 -07:00
ElizaWszola	502640c3f9	[Perf] Fix and reapply move apply w8a8 block fp8 linear to class (#25696 ) Signed-off-by: ElizaWszola <ewszola@redhat.com> Signed-off-by: ElizaWszola <elizaw.9289@gmail.com> Signed-off-by: Luka Govedič <lgovedic@redhat.com> Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: Luka Govedič <lgovedic@redhat.com>	2025-10-02 19:35:13 +00:00
Wentao Ye	89e4050af4	[Bug] Fix Weight Loading for Block FP8 Cutlass SM90 (#25909 ) Signed-off-by: yewentao256 <zhyanwentao@126.com> Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-09-30 09:15:19 +08:00
Wentao Ye	9fe4c2bdb9	[Refactor] Remove DeepGEMM OP Register (#25710 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-09-25 20:13:41 -04:00
Tyler Michael Smith	1260180c67	Revert "[Performance] Move apply_w8a8_block_fp8_linear to an op class… (#25607 ) Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>	2025-09-25 08:05:21 +00:00
Wentao Ye	1f29141258	[Refactor] Use DeepGEMM Col Major TMA Aligned Tensor (#25517 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-09-24 18:52:36 -04:00
Li, Jiang	1cbcfb94de	[Bugfix][CPU] Skip unsupported custom op register on CPU (#25534 ) Signed-off-by: jiang1.li <jiang1.li@intel.com>	2025-09-24 06:21:51 +00:00
Michael Goin	7361ab379f	Remove redundant mutates_args and dispatch_key for direct_register_custom_op (#25512 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-09-23 22:48:40 +00:00
ElizaWszola	63400259d0	[Performance] Move apply_w8a8_block_fp8_linear to an op class (#24666 ) Signed-off-by: ElizaWszola <ewszola@redhat.com> Signed-off-by: ElizaWszola <elizaw.9289@gmail.com> Signed-off-by: Luka Govedič <lgovedic@redhat.com> Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: Luka Govedič <lgovedic@redhat.com>	2025-09-23 12:03:10 -07:00
bnellnm	f11e3c516b	[Kernels] Support blocked fp8 quantization for compressed tensors MoE (#25219 ) Signed-off-by: Bill Nell <bnell@redhat.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>	2025-09-23 16:11:34 +00:00
Michael Goin	fbd6523ac0	Refactor dense FP8 tensor/channel/block utils and add CT FP8 block (#21404 )	2025-09-18 08:53:45 -04:00
bnellnm	5963b98b46	[Kernel] Delegate construction of FusedMoEQuantConfig to FusedMoEMethodBase subclasses (#22537 ) Signed-off-by: Bill Nell <bnell@redhat.com>	2025-09-17 17:43:31 -06:00
Michael Goin	c3aea10dc8	[Perf] Use upstream CUTLASS for SM90 Block FP8 kernel (#23280 ) Signed-off-by: mgoin <mgoin64@gmail.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>	2025-09-11 15:43:14 -07:00
Wentao Ye	3af47c3cc6	[Feature] Add Hopper DeepGEMM E8M0 for DeepSeekV3.1 scale_fmt (#23666 ) Signed-off-by: yewentao256 <zhyanwentao@126.com> Signed-off-by: youkaichao <youkaichao@gmail.com> Co-authored-by: youkaichao <youkaichao@gmail.com>	2025-08-27 14:09:08 +00:00
Wentao Ye	394591e343	[Feature] Enable DeepGEMM Linear on B200; 1.5% E2E throughput improvement (#23351 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-08-21 21:01:08 -07:00
Wentao Ye	f7dcce7a4a	[Feature] Add `VLLM_USE_DEEP_GEMM_E8M0` Env to Control E8M0 Scale (#21968 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-08-11 09:39:08 -07:00
Wentao Ye	eec890c1c1	[Bug] Fix B200 DeepGEMM E8M0 Accuracy Issue (#22399 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-08-06 17:03:53 -07:00
TJian	e626d286f5	[FEAT] [ROCm] [AITER]: Add AITER HIP block quant kernel (#21242 )	2025-07-28 05:07:06 +00:00
Wentao Ye	633f6e804b	[Bug] Fix DeepGemm Init Error (#21554 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-07-24 20:07:22 -07:00
Wentao Ye	774d0c014b	[Perf] Cuda Kernel for Per Token Group Quant (#21083 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-07-22 07:27:15 -07:00
Wentao Ye	76ddeff293	[Doc] Remove duplicate docstring (#21012 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-07-15 20:09:13 -07:00
TJian	80d38b8ac8	[V1] [ROCm] [AITER] Upgrade AITER to commit `916bf3c` and bugfix APIs (#20880 ) Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>	2025-07-13 15:19:32 +00:00
Wentao Ye	42d440c22b	[Perf] Use Triton instead of Torch for DeepGEMM Per Token Group Quant (#20841 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-07-12 19:38:45 -07:00
Wentao Ye	e2de455c34	[Feature] Integrate SM100 DeepGEMM support (#20087 )	2025-07-10 20:18:05 -07:00
Li, Jiang	0ec3779df7	[Bugfix][CI/CD][CPU] Fix CPU CI tests (#20383 ) Signed-off-by: jiang1.li <jiang1.li@intel.com>	2025-07-02 20:11:36 -07:00
Wentao Ye	4d36693687	[Refactor] Create a function util and cache the results for `has_deepgemm`, `has_deepep`, `has_pplx` (#20187 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-06-28 22:06:38 +00:00
Wentao Ye	879f69bed3	[Refactor] Remove duplicate `ceil_div` (#20023 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-06-25 05:19:09 +00:00

1 2

78 Commits