[BugFix] Add synchronize in CutlassW4A8LinearKernel to ensure data is ready for use. (#33078)

Co-authored-by: jinwuguo <jinwuguo@tencent.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
This commit is contained in:
Jinwu
2026-01-31 00:14:54 -08:00
committed by GitHub
parent d5c41db35b
commit f68e3ea4e1

View File

@@ -77,6 +77,7 @@ class CutlassW4A8LinearKernel(MPLinearKernel):
def transform_w_q(x):
assert isinstance(x, BasevLLMParameter)
convert_packed_uint4b8_to_signed_int4_inplace(x.data)
torch.cuda.synchronize()
permute_param_layout_(x, input_dim=0, output_dim=1, packed_dim=0)
x.data = ops.cutlass_encode_and_reorder_int4b(x.data.t().contiguous().t())
return x