Jee Jee Li
170e8ea9ea
[Misc] Unified linear print info ( #23516 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-08-24 20:13:51 -07:00
Daifeng Li
fa78de9dc3
Quantization: support FP4 quantized models on AMD CDNA2/CDNA3 GPUs ( #22527 )
...
Signed-off-by: feng <fengli1702@gmail.com >
Signed-off-by: Michael Goin <mgoin64@gmail.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
2025-08-22 20:53:21 -06:00
Li, Jiang
7be5d113d8
[CPU] Refactor CPU W8A8 scaled_mm ( #23071 )
...
Signed-off-by: jiang1.li <jiang1.li@intel.com >
2025-08-21 09:34:24 +08:00
Michael Goin
0cdbf5e61c
[Kernel/Quant] Remove the original marlin format and qqq ( #23204 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-08-20 15:13:36 -04:00
TJian
1298c67795
[FEAT] [Performance] Enable DP for ViT in Qwen2.5VL ( #22742 )
...
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com >
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
2025-08-19 15:25:57 +00:00
Michael Goin
4fc722eca4
[Kernel/Quant] Remove AQLM ( #22943 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
2025-08-16 19:38:21 +00:00
wangxiyuan
0b1bdac6af
[Platform] Custom ops support for FusedMoe ( #22509 )
...
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com >
2025-08-13 04:12:00 -07:00
Mickaël Seznec
4fb56914c5
[perf] Add fused MLA QKV + strided layernorm ( #21116 )
...
Signed-off-by: Mickael Seznec <mickael@mistral.ai >
Co-authored-by: mgoin <mgoin64@gmail.com >
2025-07-22 07:07:44 -07:00
Kevin_Xiong
c9ba8104ed
[Bugfix] weight loading use correct tp_group with patch_tensor_parallel_group ( #21024 )
...
Signed-off-by: KevinXiong-C <kevin_xiong1997@outlook.com >
2025-07-16 19:36:36 -07:00
Li, Jiang
6cc1e7d96d
[CPU] Update custom ops for the CPU backend ( #20255 )
...
Signed-off-by: jiang1.li <jiang1.li@intel.com >
2025-07-01 07:25:03 +00:00
Simon Mo
02f0c7b220
[Misc] Add SPDX-FileCopyrightText ( #19100 )
...
Signed-off-by: simon-mo <simon.mo@hey.com >
2025-06-03 11:20:17 -07:00
Isotr0py
1f1b1bc03b
[V1][Quantization] Add CUDA graph compatible v1 GGUF support ( #18646 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Signed-off-by: Isotr0py <2037008807@qq.com >
2025-05-27 04:40:28 +00:00
GiantCroc
c154d89306
[Doc] fix arg docstring in linear layers ( #18410 )
...
Signed-off-by: giantcroc <1204449533@qq.com >
2025-05-21 06:45:57 -07:00
Andrzej Kotłowski
38fe728d60
[Bugfix] Fix QKVCrossParallelLinear::sync_weight_attrs for PyTorch compile ( #17844 )
...
Signed-off-by: Andrzej Kotłowski <akotlowski@habana.ai >
2025-05-14 09:39:51 +00:00
Simon Mo
dcbac4cb4b
[Model] Qwen3 Dense FP8 Compat Fixes ( #17318 )
...
Signed-off-by: simon-mo <xmo@berkeley.edu >
2025-04-28 14:12:01 -07:00
Lei Wang
8d32dc603d
[Kernel] Support Microsoft Runtime Kernel Lib for our Low Precision Computation - BitBLAS ( #6036 )
...
Signed-off-by: xinyuxiao <xinyuxiao2024@gmail.com >
Co-authored-by: xinyuxiao <xinyuxiao2024@gmail.com >
2025-04-22 09:01:36 +01:00
Charlie Fu
188b7f9b8c
[Performance][ROCm] Add skinny gemms for unquantized linear on ROCm ( #15830 )
...
Signed-off-by: charlifu <charlifu@amd.com >
Co-authored-by: Tyler Michael Smith <tysmith@redhat.com >
2025-04-21 20:46:22 -07:00
Isotr0py
40b4284fe3
[Bugfix] Handle process_weights_after_loading for QKVCrossParallelLinear ( #15328 )
...
Signed-off-by: Isotr0py <2037008807@qq.com >
2025-04-08 10:02:23 -07:00
Pavani Majety
debd6bbf09
[Kernel] Add ModelOpt FP4 Checkpoint Support ( #12520 )
...
Signed-off-by: Pavani Majety <pmajety@nvidia.com >
2025-03-12 05:13:11 +00:00
Isotr0py
e392d85831
[Core] Refactor QKVCrossParallelLinear implementation to support BNB 4-bit quantization ( #14545 )
...
Signed-off-by: Isotr0py <2037008807@qq.com >
2025-03-11 20:12:52 -07:00
Nicolò Lucchesi
69ff99fdcd
[Core] Optimizing cross-attention QKVParallelLinear computation ( #12325 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
Signed-off-by: NickLucche <nick@nlucches-4xa100.c.openshift-330514.internal>
Co-authored-by: NickLucche <nick@nlucches-4xa100.c.openshift-330514.internal>
2025-03-06 09:37:26 +00:00
Isotr0py
e17e4488bd
[LoRA] Remove linear hack outside transformers backend ( #14177 )
...
Signed-off-by: Isotr0py <2037008807@qq.com >
2025-03-05 15:06:28 +00:00
Szymon Ożóg
7f0be2aa24
[Model] Deepseek GGUF support ( #13167 )
2025-02-27 02:08:35 -08:00
Michael Goin
09972e716c
[Bugfix] Allow fallback to AWQ from AWQMarlin at per-layer granularity ( #13119 )
2025-02-12 09:19:53 -08:00
Szymon Ożóg
2b25b7d2e1
Fix initializing GGUF weights for ColumnParallelLinear when using tensor parallel > 1 ( #13023 )
2025-02-11 08:38:48 -08:00
Harry Mellor
249824c3bf
Refactor Linear handling in TransformersModel ( #12727 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-02-05 04:31:12 +00:00
Russell Bryant
e489ad7a21
[Misc] Add SPDX-License-Identifier headers to python source files ( #12628 )
...
- **Add SPDX license headers to python source files**
- **Check for SPDX headers using pre-commit**
commit 9d7ef44c3cfb72ca4c32e1c677d99259d10d4745
Author: Russell Bryant <rbryant@redhat.com >
Date: Fri Jan 31 14:18:24 2025 -0500
Add SPDX license headers to python source files
This commit adds SPDX license headers to python source files as
recommended to
the project by the Linux Foundation. These headers provide a concise way
that is
both human and machine readable for communicating license information
for each
source file. It helps avoid any ambiguity about the license of the code
and can
also be easily used by tools to help manage license compliance.
The Linux Foundation runs license scans against the codebase to help
ensure
we are in compliance with the licenses of the code we use, including
dependencies. Having these headers in place helps that tool do its job.
More information can be found on the SPDX site:
- https://spdx.dev/learn/handling-license-info/
Signed-off-by: Russell Bryant <rbryant@redhat.com >
commit 5a1cf1cb3b80759131c73f6a9dddebccac039dea
Author: Russell Bryant <rbryant@redhat.com >
Date: Fri Jan 31 14:36:32 2025 -0500
Check for SPDX headers using pre-commit
Signed-off-by: Russell Bryant <rbryant@redhat.com >
---------
Signed-off-by: Russell Bryant <rbryant@redhat.com >
2025-02-02 11:58:18 -08:00
Martin Gleize
bbe5f9de7d
[Model] Support for fairseq2 Llama ( #11442 )
...
Signed-off-by: Martin Gleize <mgleize@meta.com >
Co-authored-by: mgleize user <mgleize@a100-st-p4de24xlarge-4.fair-a100.hpcaas >
2025-01-19 10:40:40 -08:00
kewang-xlnx
de0526f668
[Misc][Quark] Upstream Quark format to VLLM ( #10765 )
...
Signed-off-by: kewang-xlnx <kewang@xilinx.com >
Signed-off-by: kewang2 <kewang2@amd.com >
Co-authored-by: kewang2 <kewang2@amd.com >
Co-authored-by: Michael Goin <michael@neuralmagic.com >
2025-01-15 11:05:15 -05:00
Isotr0py
d14e98d924
[Model] Support GGUF models newly added in transformers 4.46.0 ( #9685 )
...
Signed-off-by: Isotr0py <2037008807@qq.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
2025-01-13 00:13:44 +00:00
Lucas Tucker
9c749713f6
[mypy] Forward pass function type hints in lora ( #11740 )
...
Signed-off-by: lucast2021 <lucast2021@headroyce.org >
Co-authored-by: lucast2021 <lucast2021@headroyce.org >
2025-01-06 07:59:36 +00:00
Michael Goin
2072924d14
[Model] [Quantization] Support deepseek_v3 w8a8 fp8 block-wise quantization ( #11523 )
...
Signed-off-by: mgoin <michael@neuralmagic.com >
Signed-off-by: simon-mo <simon.mo@hey.com >
Signed-off-by: simon-mo <xmo@berkeley.edu >
Co-authored-by: simon-mo <simon.mo@hey.com >
Co-authored-by: simon-mo <xmo@berkeley.edu >
Co-authored-by: HandH1998 <1335248067@qq.com >
2024-12-26 15:33:30 -08:00
Isotr0py
b6374e09b0
[Bugfix] Fix Phi-3 BNB quantization with tensor parallel ( #9948 )
...
Signed-off-by: Isotr0py <2037008807@qq.com >
2024-11-22 15:01:56 +08:00
ElizaWszola
b00b33d77e
[Model][Quantization] HQQ support through Marlin kernel expansion ( #9766 )
...
Signed-off-by: ElizaWszola <eliza@neuralmagic.com >
2024-11-19 13:31:12 -08:00
Jee Jee Li
7eb719df13
[Bugfix]Fix Phi-3 BNB online quantization ( #10417 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2024-11-19 03:21:42 +00:00
Yan Ma
6b2d25efc7
[Hardware][XPU] AWQ/GPTQ support for xpu backend ( #10107 )
...
Signed-off-by: yan ma <yan.ma@intel.com >
2024-11-18 11:18:05 -07:00
Li, Jiang
ca77dd7a44
[Hardware][CPU] Support AWQ for CPU backend ( #7515 )
2024-10-09 10:28:08 -06:00
chenqianfzh
2f4117c38e
support bitsandbytes quantization with more models ( #9148 )
2024-10-08 19:52:19 -06:00
Isotr0py
f19da64871
[Core] Refactor GGUF parameters packing and forwarding ( #8859 )
2024-10-07 10:01:46 +00:00
chenqianfzh
9855b99502
[Feature][kernel] tensor parallelism with bitsandbytes quantization ( #8434 )
2024-09-17 08:09:12 -07:00
Pavani Majety
efcf946a15
[Hardware][NV] Add support for ModelOpt static scaling checkpoints. ( #6112 )
2024-09-11 00:38:40 -04:00
Dipika Sikka
e16fa99a6a
[Misc] Update fbgemmfp8 to use vLLMParameters ( #7972 )
...
Co-authored-by: Michael Goin <michael@neuralmagic.com >
2024-09-03 20:12:41 -06:00
Dipika Sikka
2188a60c7e
[Misc] Update GPTQ to use vLLMParameters ( #7976 )
2024-09-03 17:21:44 -04:00
chenqianfzh
4664ceaad6
support bitsandbytes 8-bit and FP4 quantized models ( #7445 )
2024-08-29 19:09:08 -04:00
Dipika Sikka
86a677de42
[misc] update tpu int8 to use new vLLM Parameters ( #7973 )
2024-08-29 16:46:55 -04:00
Dipika Sikka
015e6cc252
[Misc] Update compressed tensors lifecycle to remove prefix from create_weights ( #7825 )
2024-08-26 18:09:34 -06:00
Dipika Sikka
dd9857f5fa
[Misc] Update gptq_marlin_24 to use vLLMParameters ( #7762 )
...
Co-authored-by: Michael Goin <michael@neuralmagic.com >
2024-08-26 17:44:54 -04:00
Dipika Sikka
665304092d
[Misc] Update qqq to use vLLMParameters ( #7805 )
2024-08-26 13:16:15 -06:00
Dipika Sikka
f1df5dbfd6
[Misc] Update marlin to use vLLMParameters ( #7803 )
2024-08-23 14:30:52 -04:00
Dipika Sikka
955b5191c9
[Misc] update fp8 to use vLLMParameter ( #7437 )
2024-08-22 08:36:18 -04:00