[Bugfix] fix use_atomic_add support of marlin kernel when using v1 engine (#15946)

Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>
This commit is contained in:
Jinzhen Lin
2025-04-06 11:04:22 +08:00
committed by GitHub
parent 13affc432d
commit 2fa66ef713
2 changed files with 6 additions and 2 deletions

View File

@@ -305,7 +305,7 @@ def should_use_atomic_add_reduce(m: int, n: int, k: int, device: torch.device,
# the performance of atomicAdd is better than global reduce
# only when m*n is small and k is large
return max(m, 64) * n < 64 * 2048 and k >= 2048
return n < 2048 and k >= 2048
def apply_gptq_marlin_linear(