Congcong Chen
|
2c11a738b3
|
[Model] New model support for microsoft/Phi-4-mini-flash-reasoning (#20702)
Signed-off-by: Congcong Chen <congcongchen@microsoft.com>
|
2025-07-12 06:02:10 -07:00 |
|
Michael Goin
|
d47661f0cd
|
[Kernel] Basic tuned configs for NVFP4 CUTLASS dense GEMM (#20646)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-07-11 10:05:33 -06:00 |
|
Duncan Moss
|
5923ab9524
|
[fix]: disable cutlass block scaled group gemm for EP (#20781)
Signed-off-by: Duncan Moss <djm.moss@gmail.com>
|
2025-07-11 02:39:18 +00:00 |
|
nishith-fujitsu
|
c7753a9809
|
[Hardware][CPU] Vllm int8 quantization enablement for ARM CPU (#14129)
Signed-off-by: nishith-fujitsu <nishith.jaiswal@fujitsu.com>
|
2025-07-10 15:59:04 +00:00 |
|
Tuan, Hoang-Trong
|
47043eb678
|
[Kernel] Triton implementation of causal-conv1d for Mamba-based models (#18218)
Signed-off-by: Tuan M. Hoang-Trong <tmhoangt@us.ibm.com>
Co-authored-by: Tuan M. Hoang-Trong <tmhoangt@us.ibm.com>
Co-authored-by: Tyler Michael Smith <tysmith@redhat.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
|
2025-07-09 12:53:55 -07:00 |
|
Wenxin Cheng
|
5eaf570050
|
Replace multiply_add with homogeneous_multiply_add to Address Clang Template Parameter Issue (#20142)
Signed-off-by: Lu Fang <lufang@fb.com>
|
2025-07-09 00:30:18 +00:00 |
|
Ming Yang
|
c438183e99
|
[Bugfix] Fix topk_ids indices_type for CUTLASS w8a8 FP8 MoE (#20166)
Signed-off-by: Ming Yang <yming@meta.com>
|
2025-07-08 23:10:57 +00:00 |
|
Lucas Wilkinson
|
40b86aa05e
|
[BugFix] Fix: ImportError when building on hopper systems (#20513)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2025-07-06 12:17:30 +08:00 |
|
Vadim Gimpelson
|
f73d02aadc
|
[BUG] Fix #20484. Support empty sequence in cuda penalty kernel (#20491)
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@centml.ai>
|
2025-07-05 19:38:02 -07:00 |
|
Duncan Moss
|
3d184b95b8
|
[feat]: CUTLASS block scaled group gemm for SM100 (#19757)
Signed-off-by: Duncan Moss <djm.moss@gmail.com>
Co-authored-by: Duncan Moss <dmoss@nvidia.com>
|
2025-07-04 12:58:04 -06:00 |
|
Wentao Ye
|
783921d889
|
[Perf] Optimize Vectorization Utils for Int 8 Quantization Kernels (#20331)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-07-04 15:06:24 +08:00 |
|
Joonchen Liau
|
9e5552aa13
|
[NVIDIA] Support Cutlass w8a8 FP8 for Blackwell Geforce GPUs (sm120) (#17280)
Signed-off-by: kaln27 <liaojuncheng123@foxmail.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
|
2025-07-02 06:47:19 -06:00 |
|
Tyler Michael Smith
|
3be8d312a2
|
[Kernel][Bugfix] Fixup some warnings in nvfp4_blockwise_moe when CUDA < 12.8 (#20324)
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
|
2025-07-01 18:05:47 -07:00 |
|
周周周
|
9290de5667
|
remove unused variables in marlin_template.h (#20236)
|
2025-07-02 00:51:52 +00:00 |
|
Li, Jiang
|
6cc1e7d96d
|
[CPU] Update custom ops for the CPU backend (#20255)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
|
2025-07-01 07:25:03 +00:00 |
|
Richard Barnes
|
86debab54c
|
Fix numel() downcast in vllm/csrc/moe/moe_align_sum_kernels.cu +2 (#17082)
Co-authored-by: mgoin <mgoin64@gmail.com>
|
2025-07-01 06:48:10 +00:00 |
|
Tyler Michael Smith
|
e8c3bd2cd1
|
[Bugfix] Fix some narrowing conversion warnings (#20141)
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
|
2025-06-27 09:01:28 -07:00 |
|
Hosang
|
94a55c7681
|
[Fix][ROCm] Remove unused variables to fix build error on GFX11/12 (#19891)
Signed-off-by: Hosang Yoon <hosang.yoon@amd.com>
|
2025-06-27 07:14:44 -07:00 |
|
li haoyang
|
0740e29b66
|
[Feature] add quick all reduce (#19744)
Signed-off-by: ilmarkov <imarkov@redhat.com>
Signed-off-by: Haoyang Li <Haoyang.Li@amd.com>
Co-authored-by: ilmarkov <imarkov@redhat.com>
|
2025-06-26 20:54:24 -07:00 |
|
Michael Goin
|
44d2e6af63
|
[Bugfix] Build moe_data for both sm100 and sm90 (#20086)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-06-26 20:50:12 -07:00 |
|
Ilya Markov
|
2d7779f888
|
[Perf] SM100 FP8 GEMM Optimizations after cutlass_profiler (#20071)
Signed-off-by: ilmarkov <imarkov@redhat.com>
Co-authored-by: ilmarkov <imarkov@redhat.com>
|
2025-06-26 20:50:09 -07:00 |
|
Li, Jiang
|
0567c8249f
|
[CPU] Fix torch version in x86 CPU backend (#19258)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
|
2025-06-26 03:34:47 -07:00 |
|
Wentao Ye
|
ffb2cd6b54
|
[Perf] Optimize moe_align_block_size CUDA kernel (#19572)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
|
2025-06-17 11:49:26 -07:00 |
|
Szymon Ożóg
|
dec66d253b
|
[Kernel] GGUF MMVQ kernel for multiple input vectors (#18754)
Signed-off-by: SzymonOzog <szymon.ozog@gmail.com>
|
2025-06-16 17:33:26 +08:00 |
|
Lu Fang
|
c6703d1e0d
|
[MISC] Remove unused variableds in C++ (#19609)
Signed-off-by: Lu Fang <lufang@fb.com>
|
2025-06-15 20:05:28 -07:00 |
|
Ilya Markov
|
e13945f9dd
|
[Perf] Further tunings for SM100 FP8 CUTLASS kernel (#19566)
|
2025-06-14 17:25:10 -07:00 |
|
jiahanc
|
294fc1e2c9
|
[Hardware][NVIDIA][kernel] Fp4 MOE quant kernel optimization (#19500)
|
2025-06-14 09:34:28 -07:00 |
|
Wentao Ye
|
ce9dc02c93
|
[Refactor] Remove unused variables in moe_permute_unpermute_kernel.inl (#19573)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-06-13 06:12:15 -07:00 |
|
Wentao Ye
|
b6efafd9e4
|
[Perf] Vectorize static / dynamic INT8 quant kernels (#19233)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-06-12 06:51:41 -07:00 |
|
Ning Xie
|
2f1c19b245
|
[CI] change spell checker from codespell to typos (#18711)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
|
2025-06-11 19:57:10 -07:00 |
|
Louie Tsai
|
5c8d34a42c
|
Support no privileged mode on CPU for docker and kubernetes deployments (#19241)
Signed-off-by: Tsai, Louie <louie.tsai@intel.com>
|
2025-06-11 04:11:47 -07:00 |
|
ElizaWszola
|
84166fee97
|
[Kernel] Integrate CUTLASS MoE kernel with PPLX (#18762)
Signed-off-by: ElizaWszola <ewszola@redhat.com>
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
|
2025-06-06 18:26:11 -07:00 |
|
Chiyue Wei
|
61059bee40
|
[Hardware][NVIDIA] FP4 MoE kernel optimization (#19110)
Signed-off-by: Chiyue Wei <chiyuew@nvidia.com>
Co-authored-by: Chiyue Wei <chiyuew@nvidia.com>
|
2025-06-05 09:48:26 -07:00 |
|
Michael Goin
|
53a5a0ce30
|
[Perf] Tunings for SM100 FP8 CUTLASS kernel (#18778)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-06-04 10:46:28 -07:00 |
|
Lain
|
5f2cd251d2
|
Sm100 blockwise fp8 swap ab (#18564)
|
2025-06-04 07:48:45 -07:00 |
|
Kaixi Hou
|
41aa578428
|
[NVIDIA] Add Cutlass MLA backend (#17625)
|
2025-06-03 21:40:26 -07:00 |
|
Vadim Gimpelson
|
5d6d1adf15
|
[KERNEL] Sampler. CUDA kernel for applying repetition penalty (#18437)
|
2025-06-03 21:13:01 -07:00 |
|
Michael Goin
|
e31446b6c8
|
[Perf] Tune scaled_fp8_quant by increasing vectorization (#18844)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-06-03 13:48:25 -07:00 |
|
Varun Sundar Rabindranath
|
fa98d77773
|
[Kernel] DeepEP dispatch-combine kernel integration (#18434)
Signed-off-by: Varun <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
|
2025-06-03 12:30:02 -07:00 |
|
Simon Mo
|
02f0c7b220
|
[Misc] Add SPDX-FileCopyrightText (#19100)
Signed-off-by: simon-mo <simon.mo@hey.com>
|
2025-06-03 11:20:17 -07:00 |
|
Charlie Fu
|
306d60401d
|
[ROCm][Kernel] Add gfx950 support for skinny gemms (#18010)
Signed-off-by: charlifu <charlifu@amd.com>
|
2025-05-31 07:40:05 -07:00 |
|
Lucas Wilkinson
|
ce75efeecb
|
[BugFix] FA2 MLA Accuracy Issue (#18807)
Signed-off-by: LucasWilkinson <lwilkinson@neuralmagic.com>
|
2025-05-28 08:59:39 +00:00 |
|
almersawi
|
a547aeb828
|
feat(rocm-support): support mamba2 on rocm (#18565)
Signed-off-by: Islam Almersawi <islam.almersawi@openinnovation.ai>
Co-authored-by: Islam Almersawi <islam.almersawi@openinnovation.ai>
|
2025-05-27 00:07:53 -07:00 |
|
Yuqi Zhang
|
d0bc2f810b
|
[Bugfix] Add half type support in reshape_and_cache_cpu_impl on x86 cpu platform (#18430)
Signed-off-by: Yuqi Zhang <yuqizhang@google.com>
Co-authored-by: Yuqi Zhang <yuqizhang@google.com>
|
2025-05-23 01:41:37 -07:00 |
|
Tyler Michael Smith
|
6e588da0f4
|
[Build/CI] Fix CUDA 11.8 build (#17679)
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
Signed-off-by: Tyler Michael Smith <tysmith@redhat.com>
Co-authored-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
|
2025-05-22 12:13:54 -07:00 |
|
Hosang
|
dd5fa7e04f
|
[ROCm][Kernel][V1] Enable AMD Radeon GPU Custom Paged Attention on v1 (#17004)
Signed-off-by: Hosang Yoon <hosang.yoon@amd.com>
|
2025-05-21 08:35:00 -07:00 |
|
bnellnm
|
92247c522e
|
[Bug] Fix moe_sum signature (#18440)
Signed-off-by: Bill Nell <bnell@redhat.com>
|
2025-05-20 22:37:08 -07:00 |
|
Lucia Fang
|
258bf621d5
|
fix CUDA_check redefinition in #17918 (#18287)
Signed-off-by: Lucia Fang <fanglu@fb.com>
Co-authored-by: Lucia (Lu) Fang <fanglu@meta.com>
|
2025-05-19 13:42:35 -07:00 |
|
Jinzhen Lin
|
e73b7dfd69
|
[Bugfix] fix an illegal memory access was encountered of marlin kernel + act_order (#18245)
|
2025-05-16 16:02:44 -07:00 |
|
Lain
|
e23564cb70
|
use ceil_div in cutlass block scaling shape check (#17918)
|
2025-05-16 03:02:58 -07:00 |
|