[Kernel] Factor out epilogues from cutlass kernels (#5391)

Co-authored-by: Michael Goin <michael@neuralmagic.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: zifeitong <zifei.tong@parasail.io>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-neuralmagic@users.noreply.github.com>
This commit is contained in:
Tyler Michael Smith
2024-06-13 14:22:19 -04:00
committed by GitHub
parent 0ce7b952f8
commit 85657b5607
12 changed files with 274 additions and 232 deletions

View File

@@ -261,7 +261,7 @@ class Fp8LinearMethod(LinearMethodBase):
qinput, x_scale = ops.scaled_fp8_quant(x, layer.input_scale)
# Fused GEMM_DQ
output = ops.cutlass_scaled_mm_dq(
output = ops.cutlass_scaled_mm(
qinput,
layer.weight,
out_dtype=x.dtype,