youkaichao
|
555aa21905
|
[V1] Fully Transparent Implementation of CPU Offloading (#15354)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2025-03-31 20:22:34 +08:00 |
|
Charlie Fu
|
e85829450d
|
[Feature][ROCm]Enable fusion pass for torch.compile on ROCm (#15050)
Signed-off-by: charlifu <charlifu@amd.com>
|
2025-03-31 04:42:18 -07:00 |
|
ElizaWszola
|
9239bf718e
|
[Kernel] CUTLASS grouped gemm fp8 MoE kernel (#13972)
Signed-off-by: ElizaWszola <eliza@neuralmagic.com>
Signed-off-by: ElizaWszola <ewszola@redhat.com>
Co-authored-by: Lucas Wilkinson <wilkinson.lucas@gmail.com>
|
2025-03-27 00:54:44 +00:00 |
|
Szymon Ożóg
|
a608160027
|
[Kernel] Fix conflicting macro names for gguf kernels (#15456)
Signed-off-by: SzymonOzog <szymon.ozog@gmail.com>
|
2025-03-25 13:50:49 +00:00 |
|
Thien Tran
|
4f044b1d67
|
[Kernel][CPU] CPU MLA (#14744)
Signed-off-by: Thien Tran <gau.nernst@yahoo.com.sg>
|
2025-03-25 09:34:59 +00:00 |
|
Lu Fang
|
051da7efe3
|
Fix CUDA kernel index data type in vllm/csrc/quantization/gptq_marlin/awq_marlin_repack.cu +10 (#15160)
Signed-off-by: Lu Fang <lufang@fb.com>
Co-authored-by: Richard Barnes <rbarnes@meta.com>
|
2025-03-25 15:36:45 +08:00 |
|
Jinzhen Lin
|
6b3cc75be0
|
[Kernel] allow non-contiguous input for marlin kernel (#14658)
Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>
|
2025-03-24 09:21:33 -04:00 |
|
Lu Fang
|
d3ccbd6350
|
Fix CUDA kernel index data type in vllm/csrc/quantization/fused_kernels/layernorm_utils.cuh +10 (#15159)
Signed-off-by: Lu Fang <lufang@fb.com>
Co-authored-by: Richard Barnes <rbarnes@meta.com>
|
2025-03-21 10:01:11 +08:00 |
|
Serena
|
64fc2193dc
|
[Misc][Docs] fix the comments of KV_T and CACHE_T in CALL_RESHAPE_AND_CACHE_XX macros (#14347)
|
2025-03-18 05:50:19 -07:00 |
|
Lu Fang
|
cd0cd85102
|
[MISC] More AMD unused var clean up (#14926)
Signed-off-by: Lu Fang <lufang@fb.com>
|
2025-03-17 16:40:41 +08:00 |
|
Li, Jiang
|
a2ae496589
|
[CPU] Support FP8 KV cache (#14741)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
|
2025-03-14 22:07:36 -07:00 |
|
Lu Fang
|
8c0d15d5c5
|
[Misc][Easy] Annotate unused vars in the csrc files (#14798)
Signed-off-by: Lu Fang <lufang@fb.com>
|
2025-03-15 12:40:09 +08:00 |
|
Yajie Wang
|
977a16772c
|
[Bugfix][Kernel]: Fix AllSpark kernel compilation errors and enable for CUDA < 12.0 (#14430)
Signed-off-by: wyj371990 <wyj371990@alibaba-inc.com>
|
2025-03-14 09:55:14 -07:00 |
|
DefTruth
|
40253bab44
|
[Bugfix][W8A8] fixed cutlass block fp8 binding (#14796)
|
2025-03-14 03:32:42 -07:00 |
|
Thien Tran
|
27b50f1fe6
|
[Bugfix][Kernel][CPU] Fix num_tokens in CPU rotary embedding kernel (#14667)
Signed-off-by: Thien Tran <gau.nernst@yahoo.com.sg>
|
2025-03-13 23:47:49 -07:00 |
|
Jeff Daily
|
2a602b055a
|
forward fix PR 14245, restore build on ROCm 6.2 (#14709)
Signed-off-by: Jeff Daily <jeff.daily@amd.com>
|
2025-03-13 20:40:15 -07:00 |
|
TJian
|
916836bbfb
|
[FEAT] [ROCm] [Embedding] Add encoder-only model support into ROCm Flash Attention to enable embedding models. (#14664)
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
|
2025-03-12 09:31:19 -07:00 |
|
Sage Moore
|
45f3f3f59e
|
[ROCm][Bugfix] Ensure that the moe_wna16_gemm kernel is not built on ROCm platforms. (#14629)
Signed-off-by: Sage Moore <sage@neuralmagic.com>
|
2025-03-12 08:00:28 -04:00 |
|
Pavani Majety
|
debd6bbf09
|
[Kernel] Add ModelOpt FP4 Checkpoint Support (#12520)
Signed-off-by: Pavani Majety <pmajety@nvidia.com>
|
2025-03-12 05:13:11 +00:00 |
|
Szymon Ożóg
|
e22ee1e7a2
|
[Kernel] GGUF MoE kernel (#14613)
Signed-off-by: SzymonOzog <szymon.ozog@aleph-alpha.com>
|
2025-03-12 03:33:27 +00:00 |
|
Lucas Wilkinson
|
07b4b7a37f
|
[BugFix/Build] Fix sparse kernels not getting built on hopper (#14572)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2025-03-11 17:09:03 +00:00 |
|
Jeff Daily
|
a1c8f3796c
|
dynamic distpatch of fp8 kernels (#14245)
Signed-off-by: Jeff Daily <jeff.daily@amd.com>
|
2025-03-11 10:54:56 -04:00 |
|
Jinzhen Lin
|
90e88ab756
|
[Kernel] moe wna16 cuda kernel (#13321)
Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
|
2025-03-10 20:12:40 -04:00 |
|
Szymon Ożóg
|
89cdaa83e7
|
[Kernel] Add more dtype support for GGUF kernels (#14043)
Signed-off-by: SzymonOzog <szymon.ozog@aleph-alpha.com>
Signed-off-by: SzymonOzog <szymon.ozog@gmail.com>
|
2025-03-10 07:30:04 -07:00 |
|
Lucas Wilkinson
|
7caff01a7b
|
[Build/BugFix] Fix hopper 12.8 build (#14354)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
|
2025-03-08 08:11:56 +00:00 |
|
Jinzhen Lin
|
d0feea31c7
|
[Kernel] optimize performance of gptq marlin kernel when n is small (#14138)
Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>
|
2025-03-07 11:53:38 -05:00 |
|
Lucas Wilkinson
|
e5e03c2c1b
|
[BugFix] Illegal Memory Access in the blockwise cutlass fp8 GEMMs (#14396)
|
2025-03-06 21:56:06 -08:00 |
|
Tyler Michael Smith
|
99b0915d3b
|
[Kernel] Add needs_fixed_stride_order tag to most GEMMs (#14306)
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
|
2025-03-06 14:17:09 -08:00 |
|
Dilip Gowda Bhagavan
|
ada19210a3
|
Adding cpu inference with VXE ISA for s390x architecture (#12613)
Signed-off-by: Dilip Gowda Bhagavan <dilip.bhagavan@ibm.com>
Signed-off-by: Rishika Kedia <rishika.kedia@in.ibm.com>
Co-authored-by: Rishika Kedia <rishika.kedia@in.ibm.com>
|
2025-03-06 08:40:53 -08:00 |
|
kushanam
|
f89978ad7c
|
add cutlass support for blackwell fp8 gemm (#13798)
|
2025-03-04 07:55:07 -08:00 |
|
TJian
|
848a6438ae
|
[ROCm] Faster Custom Paged Attention kernels (#12348)
|
2025-03-03 09:24:45 -08:00 |
|
Sheng Yao
|
09e56f9262
|
[Bugfix] Explicitly include "omp.h" for MacOS to avoid installation failure (#14051)
|
2025-03-02 17:35:01 -08:00 |
|
Harry Mellor
|
cf069aa8aa
|
Update deprecated Python 3.8 typing (#13971)
|
2025-03-02 17:34:51 -08:00 |
|
YajieWang
|
6a92ff93e1
|
[Misc][Kernel]: Add GPTQAllSpark Quantization (#12931)
|
2025-02-28 22:30:59 -08:00 |
|
Sage Moore
|
378b3ef6f8
|
[ROCm][V1] Update reshape_and_cache to properly work with CUDA graph padding (#13922)
|
2025-02-26 20:04:12 -08:00 |
|
ℍ𝕠𝕝𝕝𝕠𝕨 𝕄𝕒𝕟
|
a31614e386
|
[ROCm][Quantization][Kernel] Use FP8 FNUZ when OCP flag is 0 or undefined (#13851)
Signed-off-by: Hollow Man <hollowman@opensuse.org>
|
2025-02-27 10:39:10 +08:00 |
|
Henry Tsang
|
094b7d9496
|
[Kernel][Build/CI] Bump CUTLASS to 3.8 and add initializers for cutlass epilogues (#13797)
|
2025-02-25 18:52:03 -08:00 |
|
Gregory Shtrasberg
|
aabeb2688f
|
[ROCm][Quantization][Kernel] Using HIP FP8 header (#12593)
|
2025-02-25 00:39:59 -08:00 |
|
Roger Wang
|
82e0d601fc
|
[CI/Build] Fix pre-commit errors from #13571 (#13709)
Signed-off-by: Roger Wang <ywang@roblox.com>
|
2025-02-22 16:50:38 -08:00 |
|
Kaixi Hou
|
e109e598c7
|
[NVIDIA] Support nvfp4 cutlass gemm (#13571)
|
2025-02-22 05:24:05 -08:00 |
|
Lucas Wilkinson
|
288cc6c234
|
[Attention] MLA with chunked prefill (#12639)
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Co-authored-by: Patrick Horn <patrick.horn@gmail.com>
Co-authored-by: simon-mo <xmo@berkeley.edu>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
|
2025-02-21 15:30:12 -08:00 |
|
leoneo
|
839b27c6cc
|
[Kernel]Add streamK for block-quantized CUTLASS kernels (#12978)
|
2025-02-20 22:14:24 -08:00 |
|
Szymon Ożóg
|
1cdc88614a
|
Missing comment explaining VDR variable in GGUF kernels (#13290)
|
2025-02-20 22:06:54 -08:00 |
|
Kaixi Hou
|
27a09dc52c
|
[NVIDIA] Fix an issue to use current stream for the nvfp4 quant (#13632)
|
2025-02-20 22:01:48 -08:00 |
|
Gregory Shtrasberg
|
0023cd2b9d
|
[ROCm] MI300A compile targets deprecation (#13560)
|
2025-02-19 23:05:00 -08:00 |
|
Sage Moore
|
c9f9d5b397
|
[Bugfix][AMD] Update torch_bindings so that scaled_fp4_quant isn't build on ROCm (#13235)
|
2025-02-14 20:30:42 -08:00 |
|
Jinzhen Lin
|
8c32b08a86
|
[Kernel] Fix awq error when n is not divisable by 128 (#13227)
|
2025-02-13 20:07:05 -08:00 |
|
Tyler Michael Smith
|
c1e37bf71b
|
[Kernel][Bugfix] Refactor and Fix CUTLASS 2:4 Sparse Kernels (#13198)
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
|
2025-02-14 00:01:14 +00:00 |
|
Michael Goin
|
2344192a55
|
Optimize moe_align_block_size for deepseek_v3 (#12850)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-02-13 18:43:37 -05:00 |
|
Kaixi Hou
|
4fc5c23bb6
|
[NVIDIA] Support nvfp4 quantization (#12784)
|
2025-02-12 19:51:51 -08:00 |
|