Commit Graph

113 Commits

Author SHA1 Message Date
Robert Shaw
247d1a32ea [Quantization][Deprecation] Remove BitBlas (#32683)
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Signed-off-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
2026-01-28 11:06:22 +00:00
Robert Shaw
85f55c943c [Quantization][Deprecation] Deprecate HQQ (#32681)
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
2026-01-21 09:32:40 -05:00
maang
52d428295d [Core] Refactor ColumnParallelLinear: remove unused parameter and optimize forward (#31939)
Signed-off-by: maang <maang_h@163.com>
2026-01-10 04:19:49 +00:00
Shanshan Shen
08d954f036 [Doc] Add developer guide for CustomOp (#30886)
Signed-off-by: shen-shanshan <467638484@qq.com>
2026-01-09 16:21:11 +00:00
Kyuyeun Kim
cc410e8644 [Bugfix] Fix weight_loader v1 block scale (#31103)
Signed-off-by: Kyuyeun Kim <kyuyeunk@google.com>
2026-01-02 13:14:10 +08:00
CedricHuang
19cc9468fd [Feature]: Support NVIDIA ModelOpt HF FP8 variants FP8_PER_CHANNEL_PER_TOKEN and FP8_PB_WO in vLLM (#30957) 2025-12-21 22:34:49 -05:00
Zhonghua Deng
969bbc7c61 [Model] Add MiMo-V2-Flash support (#30836)
Signed-off-by: Abatom <abzhonghua@gmail.com>
Signed-off-by: Jumiar <liuanqim10@126.com>
Signed-off-by: Zyann7 <zyann7@outlook.com>
Co-authored-by: Jumiar <liuanqim10@126.com>
Co-authored-by: Zyann7 <zyann7@outlook.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
2025-12-19 17:17:03 +00:00
Zhuohan Li
d29483b58a [Minor] Remove unnecessary error message (#27115)
Signed-off-by: Zhuohan Li <zhuohan123@gmail.com>
2025-10-17 20:02:12 +00:00
Harry Mellor
8fcaaf6a16 Update Optional[x] -> x | None and Union[x, y] to x | y (#26633)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-10-12 09:51:31 -07:00
Harry Mellor
7c12763b24 Fix some typing issues found by mypy==1.18.2 (#26596)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-10-10 18:21:25 +00:00
Isotr0py
d1ddf340c8 [V0 deprecation] Remove QKVCrossParallelLinear implementation (#26475)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-10-09 10:52:27 +00:00
Harry Mellor
4e256cadc2 Remove all references to yapf as it's no longer used (#26251)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-10-05 09:18:11 -07:00
Harry Mellor
d6953beb91 Convert formatting to use ruff instead of yapf + isort (#26247)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-10-05 07:06:22 -07:00
Aleksandr Malyshev
53a30845be Llamas 3.1 405B fp4 changes upstreaming from 355_wip (#25135)
Signed-off-by: Aleksandr Malyshev <maleksan@amd.com>
Co-authored-by: Aleksandr Malyshev <maleksan@amd.com>
Co-authored-by: Doug Lehr <douglehr@amd.com>
2025-09-25 19:16:53 -06:00
Kyle Sayers
de94289a98 [Core] Support weight_loader_v2 for UnquantizedLinearMethod (#23036)
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
2025-09-23 18:30:26 -06:00
Michael Goin
fbd6523ac0 Refactor dense FP8 tensor/channel/block utils and add CT FP8 block (#21404) 2025-09-18 08:53:45 -04:00
Rafael Marcelino Koike
b834b4cbf1 [USAGE] Improve error handling for weight initialization in Unquantized… (#20321)
Signed-off-by: Rafael Marcelino Koike <rafael.koike@oracle.com>
Signed-off-by: Rafael Koike <koike.rafael@gmail.com>
2025-09-15 16:45:49 +00:00
Didier Durand
41ae4a1eab [Doc]: fix typos in various files (#24798)
Signed-off-by: Didier Durand <durand.didier@gmail.com>
2025-09-13 00:43:33 -07:00
Isotr0py
00a4e56d8d [Bugfix] Fix broken deepseek fp8 TP weights loading (#24367)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-09-06 09:23:12 -07:00
Isotr0py
53b19ccdd5 [Core] Allow disabling TP sharding for parallel Linear layer (#23024)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-09-05 22:53:58 -07:00
Li, Jiang
57b1ce94f7 [CPU] Refactor CPU unquantized linear (#24150)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
2025-09-04 14:28:45 +08:00
Kyuyeun Kim
9480ae24e3 [Bugfix] Fix packed_factor missing attribute error (#23902)
Signed-off-by: Kyuyeun Kim <kyuyeunk@google.com>
2025-09-02 10:56:31 -07:00
Kyle Sayers
22feac8e95 [Transform] [Quantization] Add transforms to compressed tensors (#22486) 2025-08-28 02:43:48 -04:00
Hyogeun Oh (오효근)
730d0ac8b9 [Docs] Fix warnings in mkdocs build (#23649)
Signed-off-by: Zerohertz <ohg3417@gmail.com>
Signed-off-by: Hyogeun Oh (오효근) <ohg3417@gmail.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-08-26 18:19:23 +00:00
Jee Jee Li
170e8ea9ea [Misc] Unified linear print info (#23516)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-08-24 20:13:51 -07:00
Daifeng Li
fa78de9dc3 Quantization: support FP4 quantized models on AMD CDNA2/CDNA3 GPUs (#22527)
Signed-off-by: feng <fengli1702@gmail.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
2025-08-22 20:53:21 -06:00
Li, Jiang
7be5d113d8 [CPU] Refactor CPU W8A8 scaled_mm (#23071)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
2025-08-21 09:34:24 +08:00
Michael Goin
0cdbf5e61c [Kernel/Quant] Remove the original marlin format and qqq (#23204)
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-08-20 15:13:36 -04:00
TJian
1298c67795 [FEAT] [Performance] Enable DP for ViT in Qwen2.5VL (#22742)
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2025-08-19 15:25:57 +00:00
Michael Goin
4fc722eca4 [Kernel/Quant] Remove AQLM (#22943)
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
2025-08-16 19:38:21 +00:00
wangxiyuan
0b1bdac6af [Platform] Custom ops support for FusedMoe (#22509)
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-08-13 04:12:00 -07:00
Mickaël Seznec
4fb56914c5 [perf] Add fused MLA QKV + strided layernorm (#21116)
Signed-off-by: Mickael Seznec <mickael@mistral.ai>
Co-authored-by: mgoin <mgoin64@gmail.com>
2025-07-22 07:07:44 -07:00
Kevin_Xiong
c9ba8104ed [Bugfix] weight loading use correct tp_group with patch_tensor_parallel_group (#21024)
Signed-off-by: KevinXiong-C <kevin_xiong1997@outlook.com>
2025-07-16 19:36:36 -07:00
Li, Jiang
6cc1e7d96d [CPU] Update custom ops for the CPU backend (#20255)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
2025-07-01 07:25:03 +00:00
Simon Mo
02f0c7b220 [Misc] Add SPDX-FileCopyrightText (#19100)
Signed-off-by: simon-mo <simon.mo@hey.com>
2025-06-03 11:20:17 -07:00
Isotr0py
1f1b1bc03b [V1][Quantization] Add CUDA graph compatible v1 GGUF support (#18646)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: Isotr0py <2037008807@qq.com>
2025-05-27 04:40:28 +00:00
GiantCroc
c154d89306 [Doc] fix arg docstring in linear layers (#18410)
Signed-off-by: giantcroc <1204449533@qq.com>
2025-05-21 06:45:57 -07:00
Andrzej Kotłowski
38fe728d60 [Bugfix] Fix QKVCrossParallelLinear::sync_weight_attrs for PyTorch compile (#17844)
Signed-off-by: Andrzej Kotłowski <akotlowski@habana.ai>
2025-05-14 09:39:51 +00:00
Simon Mo
dcbac4cb4b [Model] Qwen3 Dense FP8 Compat Fixes (#17318)
Signed-off-by: simon-mo <xmo@berkeley.edu>
2025-04-28 14:12:01 -07:00
Lei Wang
8d32dc603d [Kernel] Support Microsoft Runtime Kernel Lib for our Low Precision Computation - BitBLAS (#6036)
Signed-off-by: xinyuxiao <xinyuxiao2024@gmail.com>
Co-authored-by: xinyuxiao <xinyuxiao2024@gmail.com>
2025-04-22 09:01:36 +01:00
Charlie Fu
188b7f9b8c [Performance][ROCm] Add skinny gemms for unquantized linear on ROCm (#15830)
Signed-off-by: charlifu <charlifu@amd.com>
Co-authored-by: Tyler Michael Smith <tysmith@redhat.com>
2025-04-21 20:46:22 -07:00
Isotr0py
40b4284fe3 [Bugfix] Handle process_weights_after_loading for QKVCrossParallelLinear (#15328)
Signed-off-by: Isotr0py <2037008807@qq.com>
2025-04-08 10:02:23 -07:00
Pavani Majety
debd6bbf09 [Kernel] Add ModelOpt FP4 Checkpoint Support (#12520)
Signed-off-by: Pavani Majety <pmajety@nvidia.com>
2025-03-12 05:13:11 +00:00
Isotr0py
e392d85831 [Core] Refactor QKVCrossParallelLinear implementation to support BNB 4-bit quantization (#14545)
Signed-off-by: Isotr0py <2037008807@qq.com>
2025-03-11 20:12:52 -07:00
Nicolò Lucchesi
69ff99fdcd [Core] Optimizing cross-attention QKVParallelLinear computation (#12325)
Signed-off-by: NickLucche <nlucches@redhat.com>
Signed-off-by: NickLucche <nick@nlucches-4xa100.c.openshift-330514.internal>
Co-authored-by: NickLucche <nick@nlucches-4xa100.c.openshift-330514.internal>
2025-03-06 09:37:26 +00:00
Isotr0py
e17e4488bd [LoRA] Remove linear hack outside transformers backend (#14177)
Signed-off-by: Isotr0py <2037008807@qq.com>
2025-03-05 15:06:28 +00:00
Szymon Ożóg
7f0be2aa24 [Model] Deepseek GGUF support (#13167) 2025-02-27 02:08:35 -08:00
Michael Goin
09972e716c [Bugfix] Allow fallback to AWQ from AWQMarlin at per-layer granularity (#13119) 2025-02-12 09:19:53 -08:00
Szymon Ożóg
2b25b7d2e1 Fix initializing GGUF weights for ColumnParallelLinear when using tensor parallel > 1 (#13023) 2025-02-11 08:38:48 -08:00
Harry Mellor
249824c3bf Refactor Linear handling in TransformersModel (#12727)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-02-05 04:31:12 +00:00