Charlie Fu
|
306d60401d
|
[ROCm][Kernel] Add gfx950 support for skinny gemms (#18010)
Signed-off-by: charlifu <charlifu@amd.com>
|
2025-05-31 07:40:05 -07:00 |
|
rongfu.leng
|
7f21e8052b
|
[Misc] add group_size is -1 in awq quantization (#18910)
Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>
|
2025-05-30 17:34:22 +00:00 |
|
Michael Goin
|
4d0a1541be
|
[Bugfix] Remove NVFP4 scales assertions to fix load_format=dummy (#18861)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-05-30 13:37:36 +08:00 |
|
Wenhua Cheng
|
3de3eadf5b
|
improve the robustness of parsing vlms config in AutoRound (#18894)
Signed-off-by: wenhuach21 <wenhua.cheng@intel.com>
|
2025-05-29 19:24:47 -07:00 |
|
Satyajith Chilappagari
|
e0cbad4e30
|
[Neuron] Support quantization on neuron (#18283)
Signed-off-by: Satyajith Chilappagari <satchill@amazon.com>
|
2025-05-27 22:10:33 +00:00 |
|
vllmellm
|
d260f799a9
|
[FEAT] [ROCm] Upgrade AITER Fused MoE kernels. (#18271)
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
|
2025-05-26 23:14:07 -07:00 |
|
Isotr0py
|
1f1b1bc03b
|
[V1][Quantization] Add CUDA graph compatible v1 GGUF support (#18646)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-05-27 04:40:28 +00:00 |
|
Cyrus Leung
|
503f8487c2
|
[Misc] Reduce logs on startup (#18649)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-05-24 23:03:53 -07:00 |
|
Wenhua Cheng
|
ec82c3e388
|
FIX MOE issue in AutoRound format (#18586)
Signed-off-by: wenhuach21 <wenhua.cheng@intel.com>
|
2025-05-23 22:01:40 -07:00 |
|
Feng XiaoLong
|
4fc1bf813a
|
[Bugfix] Migrate to REGEX Library to prevent catastrophic backtracking (#18454)
Signed-off-by: Crucifixion-Fxl <xmufxl@gmail.com>
Co-authored-by: Crucifixion-Fxl <xmufxl@gmail.com>
|
2025-05-23 16:16:26 -07:00 |
|
youkaichao
|
6a7988c55b
|
Refactor pplx init logic to make it modular (prepare for deepep) (#18200)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2025-05-23 23:43:43 +08:00 |
|
Kay Yan
|
7ab056c273
|
[Hardware][CPU] Update intel_extension_for_pytorch 2.7.0 and move to requirements/cpu.txt (#18542)
Signed-off-by: Kay Yan <kay.yan@daocloud.io>
|
2025-05-23 04:38:42 -07:00 |
|
Michael Goin
|
f4a8a37465
|
[Minor] Rename quantization nvfp4 to modelopt_fp4 (#18356)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-05-20 09:08:37 -07:00 |
|
Random Fly
|
bca55b556f
|
[Bugfix] fix adding bias twice in ipex GPTQ quantization (#18363)
Signed-off-by: rand-fly <randfly@outlook.com>
|
2025-05-20 00:54:33 -07:00 |
|
Wenhua Cheng
|
e2ee1e8e9e
|
[Feature]Add support for models quantized with AutoRound (#17850)
Signed-off-by: wenhuach21 <wenhua.cheng@intel.com>
|
2025-05-19 09:38:53 -07:00 |
|
Lain
|
e23564cb70
|
use ceil_div in cutlass block scaling shape check (#17918)
|
2025-05-16 03:02:58 -07:00 |
|
Jerry Zhang
|
7974736740
|
Add support for loading torchao models with AOPerModuleConfig (#17826)
Signed-off-by: Jerry Zhang <jerryzh168@gmail.com>
|
2025-05-14 16:24:59 -07:00 |
|
bnellnm
|
f9c069c85e
|
Modularize fused experts and integrate PPLX kernels (#15956)
|
2025-05-14 13:11:54 -07:00 |
|
TJian
|
612c2edb4f
|
[FEAT] [ROCm]: Add AITER CK 2 Stages MoE support (#17110)
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Co-authored-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
|
2025-05-14 03:03:11 -07:00 |
|
Michael Goin
|
9a2a6357de
|
[Bugfix] Fix FP8 Marlin MoE and enable for compressed-tensors models (#18026)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-05-13 19:48:33 -07:00 |
|
Pavani Majety
|
65f0f74b66
|
[Hardware/NVIDIA/Modelopt] Fix modelopt forward method for v1 torch.compile (#18101)
Signed-off-by: Pavani Majety <pmajety@nvidia.com>
|
2025-05-13 19:33:00 -07:00 |
|
vllmellm
|
40de1ef455
|
[FEAT] [ROCm]: Add AITER Block-Scaled GEMM Feature (#14968)
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com>
|
2025-05-13 19:08:20 -07:00 |
|
Harry Mellor
|
6223dd8114
|
Update deprecated type hinting in model_executor/layers (#18056)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-05-13 04:17:23 -07:00 |
|
Michael Goin
|
ea6ae8cb45
|
[Bugfix] Fix marlin moe fallback logic for llama4 (#18042)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-05-13 07:53:28 +00:00 |
|
Michael Goin
|
1df491c522
|
[Bugfix] Fixes for new marlin moe usage (#18017)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-05-13 03:50:04 +00:00 |
|
Michael Goin
|
307939f299
|
Use NVFP4 Marlin for CompressedTensorsW4A16Fp4 (#18000)
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: Dipika <dipikasikka1@gmail.com>
Co-authored-by: Dipika <dipikasikka1@gmail.com>
|
2025-05-12 18:07:34 -06:00 |
|
Michael Goin
|
f065de4e88
|
Fix FBGEMM integration (#18002)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-05-12 23:02:07 +00:00 |
|
Dipika Sikka
|
cd3edfc908
|
[Misc] Add compressed-tensors NVFP4A16 emulation support (#17914)
Signed-off-by: Dipika Sikka <dipikasikka1@gmail.com>
Signed-off-by: Dipika <dipikasikka1@gmail.com>
|
2025-05-11 15:58:38 +08:00 |
|
Jinzhen Lin
|
d74e5f37bc
|
[Kernel] fp4 marlin kernel (#17687)
Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>
|
2025-05-10 19:58:49 -07:00 |
|
Pavani Majety
|
0c0fdae84f
|
[Hardware/NVIDIA/Kernel] Enable nvidia/DeepSeek-R1-FP4 Model (#16362)
|
2025-05-09 16:24:41 -07:00 |
|
Michael Goin
|
22481fbfa3
|
Update CT WNA16MarlinMoE integration (#16666)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-05-09 13:19:45 -04:00 |
|
Shu Wang
|
376786fac1
|
Add cutlass support for blackwell fp8 blockwise gemm (#14383)
Signed-off-by: Shu Wang <shuw@nvidia.com>
|
2025-05-08 15:09:55 -07:00 |
|
Michael Goin
|
4f605a6de5
|
Fix noisy warning for uncalibrated q_scale/p_scale (#17414)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-05-08 15:56:59 -04:00 |
|
fxmarty-amd
|
bb239a730f
|
[Bugfix] Fix quark fp8 format loading on AMD GPUs (#12612)
Signed-off-by: Felix Marty <felmarty@amd.com>
Signed-off-by: kewang2 <kewang2@amd.com>
Co-authored-by: kewang2 <kewang2@amd.com>
|
2025-05-08 02:53:53 -07:00 |
|
Bowen Bao
|
db593aa67f
|
[Quantization] Quark MXFP4 format loading (#16943)
|
2025-05-07 15:05:05 -04:00 |
|
Szymon Ożóg
|
1a45a61387
|
[Kernel] GGUF MoeVec kernel (#16780)
Signed-off-by: SzymonOzog <szymon.ozog@aleph-alpha.com>
Signed-off-by: SzymonOzog <szymon.ozog@gmail.com>
Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Isotr0py <2037008807@qq.com>
|
2025-05-06 23:07:23 -07:00 |
|
Mengqing Cao
|
f9bc5a0693
|
[Bugfix] Fix triton import with local TritonPlaceholder (#17446)
Signed-off-by: Mengqing Cao <cmq0113@163.com>
|
2025-05-06 17:53:09 +08:00 |
|
Jinzhen Lin
|
1d0c9d6b2d
|
[Kernel] some optimizations for dense marlin and moe marlin (#16850)
Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>
|
2025-05-05 09:39:30 -07:00 |
|
rasmith
|
e3d0a1d190
|
[Quantizaton] [AMD] Add support for running DeepSeek int8 w8a8 MoE on ROCm (#17558)
Signed-off-by: Randall Smith <Randall.Smith@amd.com>
|
2025-05-02 21:41:10 -07:00 |
|
Eric Hartford
|
9b103a1d76
|
fix typo in logging (#17605)
|
2025-05-02 18:04:40 -07:00 |
|
Michael Goin
|
868c546da4
|
Support W8A8 INT8 MoE for compressed-tensors (#16745)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-05-02 10:03:32 -04:00 |
|
Michael Goin
|
b4003d11fc
|
Check if bitblas is installed during support check (#17572)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-05-02 04:32:54 +00:00 |
|
Michael Goin
|
24aebae177
|
[Bugfix] Disable gptq_bitblas for <SM80 to fix GPTQ on V100/T4 (#17541)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-05-01 17:59:35 -07:00 |
|
NaLan ZeYu
|
1144a8efe7
|
[Bugfix] Temporarily disable gptq_bitblas on ROCm (#17411)
Signed-off-by: Yan Cangang <nalanzeyu@gmail.com>
|
2025-04-30 19:51:45 -07:00 |
|
Aaron Pham
|
da4e7687b5
|
[Fix] Support passing args to logger (#17425)
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
|
2025-04-30 08:06:58 -07:00 |
|
Harry Mellor
|
13698db634
|
Improve configs - ModelConfig (#17130)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-04-30 10:38:22 +08:00 |
|
a2q1p
|
0ed27ef66c
|
Fix: Spelling of inference (#17387)
|
2025-04-29 09:23:39 -07:00 |
|
Charlie Fu
|
ed2462030f
|
[Bugfix] Fix moe weight losing all extra attrs after process_weights_after_loading. (#16854)
Signed-off-by: charlifu <charlifu@amd.com>
|
2025-04-28 21:05:07 +00:00 |
|
Harry Mellor
|
c7941cca18
|
Explicitly explain quant method override ordering and ensure all overrides are ordered (#17256)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-04-28 16:55:31 +00:00 |
|
Harry Mellor
|
b6dd32aa07
|
Make name of compressed-tensors quant method consistent across vLLM (#17255)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-04-28 16:28:13 +00:00 |
|