Commit Graph

26 Commits

Author SHA1 Message Date
bnellnm
e9b92dcd89 [Kernels] Overlap shared experts with send/recv (#23273)
Signed-off-by: Bill Nell <bnell@redhat.com>
2025-09-03 12:35:18 -04:00
Xin Yang
8fb85b7bb6 Add routed_scaling_factor to MoE grouped topk (#23123)
Signed-off-by: Xin Yang <xyangx@amazon.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2025-08-29 21:36:48 -07:00
Isotr0py
4645024d3a [Quantization] Allow GGUF quantization to skip unquantized layer (#23188)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-08-22 13:04:22 -06:00
bnellnm
8ad7285ea2 [Kernels] Clean up FusedMoeMethodBase and modular kernel setup. Remove extra arguments from modular kernel methods. (#22035)
Signed-off-by: Bill Nell <bnell@redhat.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
2025-08-15 14:46:00 -04:00
Bowen Wang
e9fd658a73 [Feature] Expert Parallelism Load Balancer (EPLB) (#18343)
Signed-off-by: Bowen Wang <abmfy@icloud.com>
2025-06-26 15:30:21 -07:00
Szymon Ożóg
dec66d253b [Kernel] GGUF MMVQ kernel for multiple input vectors (#18754)
Signed-off-by: SzymonOzog <szymon.ozog@gmail.com>
2025-06-16 17:33:26 +08:00
Simon Mo
02f0c7b220 [Misc] Add SPDX-FileCopyrightText (#19100)
Signed-off-by: simon-mo <simon.mo@hey.com>
2025-06-03 11:20:17 -07:00
Isotr0py
1f1b1bc03b [V1][Quantization] Add CUDA graph compatible v1 GGUF support (#18646)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: Isotr0py <2037008807@qq.com>
2025-05-27 04:40:28 +00:00
Harry Mellor
6223dd8114 Update deprecated type hinting in model_executor/layers (#18056)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-05-13 04:17:23 -07:00
Szymon Ożóg
1a45a61387 [Kernel] GGUF MoeVec kernel (#16780)
Signed-off-by: SzymonOzog <szymon.ozog@aleph-alpha.com>
Signed-off-by: SzymonOzog <szymon.ozog@gmail.com>
Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Isotr0py <2037008807@qq.com>
2025-05-06 23:07:23 -07:00
Harry Mellor
13698db634 Improve configs - ModelConfig (#17130)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-04-30 10:38:22 +08:00
Lu Fang
55dcce91df Upstream Llama4 Support to Main (#16113)
Signed-off-by: Aston Zhang <22279212+astonzhang@users.noreply.github.com>
Signed-off-by: Chris Thi <chris.c.thi@gmail.com>
Signed-off-by: drisspg <drisspguessous@gmail.com>
Signed-off-by: Jon Swenson <jmswen@gmail.com>
Signed-off-by: Keyun Tong <tongkeyun@gmail.com>
Signed-off-by: Lu Fang <fanglu@meta.com>
Signed-off-by: Xiaodong Wang <xdwang@meta.com>
Signed-off-by: Yang Chen <yangche@fb.com>
Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com>
Signed-off-by: Yong Hoon Shin <yhshin@meta.com>
Signed-off-by: Zijing Liu <liuzijing2014@gmail.com>
Signed-off-by: Lu Fang <lufang@fb.com>
Signed-off-by: Lu Fang <fanglu@fb.com>
Signed-off-by: Lucia Fang <fanglu@fb.com>
Signed-off-by: Roger Wang <ywang@roblox.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: Lu Fang <fanglu@fb.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-04-07 08:06:27 -07:00
LukasBluebaum
90969fb39a [Kernel] Add more dtype support for GGUF dequantization (#15879)
Signed-off-by: lukas.bluebaum <lukas.bluebaum@aleph-alpha.com>
2025-04-02 01:58:48 -07:00
Isotr0py
179a619c21 [Bugfix] Fix broken CPU quantization due to triton import (#15038)
Signed-off-by: Isotr0py <2037008807@qq.com>
2025-03-18 08:57:39 -07:00
Szymon Ożóg
55211b01e8 [Bugfix] Fix chunked prefill for GGUF (#14666)
Signed-off-by: SzymonOzog <szymon.ozog@aleph-alpha.com>
2025-03-13 07:19:03 +00:00
Szymon Ożóg
e22ee1e7a2 [Kernel] GGUF MoE kernel (#14613)
Signed-off-by: SzymonOzog <szymon.ozog@aleph-alpha.com>
2025-03-12 03:33:27 +00:00
Szymon Ożóg
89cdaa83e7 [Kernel] Add more dtype support for GGUF kernels (#14043)
Signed-off-by: SzymonOzog <szymon.ozog@aleph-alpha.com>
Signed-off-by: SzymonOzog <szymon.ozog@gmail.com>
2025-03-10 07:30:04 -07:00
Szymon Ożóg
7f0be2aa24 [Model] Deepseek GGUF support (#13167) 2025-02-27 02:08:35 -08:00
Kyle Sayers
12913d17ba [Quant] Add SupportsQuant to phi3 and clip (#13104) 2025-02-15 19:28:33 -08:00
Russell Bryant
e489ad7a21 [Misc] Add SPDX-License-Identifier headers to python source files (#12628)
- **Add SPDX license headers to python source files**
- **Check for SPDX headers using pre-commit**

commit 9d7ef44c3cfb72ca4c32e1c677d99259d10d4745
Author: Russell Bryant <rbryant@redhat.com>
Date:   Fri Jan 31 14:18:24 2025 -0500

    Add SPDX license headers to python source files
    
This commit adds SPDX license headers to python source files as
recommended to
the project by the Linux Foundation. These headers provide a concise way
that is
both human and machine readable for communicating license information
for each
source file. It helps avoid any ambiguity about the license of the code
and can
    also be easily used by tools to help manage license compliance.
    
The Linux Foundation runs license scans against the codebase to help
ensure
    we are in compliance with the licenses of the code we use, including
dependencies. Having these headers in place helps that tool do its job.
    
    More information can be found on the SPDX site:
    
    - https://spdx.dev/learn/handling-license-info/
    
    Signed-off-by: Russell Bryant <rbryant@redhat.com>

commit 5a1cf1cb3b80759131c73f6a9dddebccac039dea
Author: Russell Bryant <rbryant@redhat.com>
Date:   Fri Jan 31 14:36:32 2025 -0500

    Check for SPDX headers using pre-commit
    
    Signed-off-by: Russell Bryant <rbryant@redhat.com>

---------

Signed-off-by: Russell Bryant <rbryant@redhat.com>
2025-02-02 11:58:18 -08:00
Isotr0py
b98c62ba49 [Bugfix] Fix GGUF inference with FP16 unquantized checkpoint (#10675)
Signed-off-by: Isotr0py <2037008807@qq.com>
2024-11-27 10:43:17 -08:00
Michael Goin
399c798608 Remove ScaledActivation for AWQ (#10057)
Signed-off-by: mgoin <michael@neuralmagic.com>
2024-11-06 14:27:06 +00:00
Isotr0py
f19da64871 [Core] Refactor GGUF parameters packing and forwarding (#8859) 2024-10-07 10:01:46 +00:00
Isotr0py
fc990f9795 [Bugfix][Kernel] Add IQ1_M quantization implementation to GGUF kernel (#8357) 2024-09-15 16:51:44 -06:00
Isotr0py
7601cb044d [Core] Support tensor parallelism for GGUF quantization (#7520) 2024-08-19 17:30:14 -04:00
Isotr0py
360bd67cf0 [Core] Support loading GGUF model (#5191)
Co-authored-by: Michael Goin <michael@neuralmagic.com>
2024-08-05 17:54:23 -06:00