[Core] Optimizing cross-attention QKVParallelLinear computation (#12325)
Signed-off-by: NickLucche <nlucches@redhat.com> Signed-off-by: NickLucche <nick@nlucches-4xa100.c.openshift-330514.internal> Co-authored-by: NickLucche <nick@nlucches-4xa100.c.openshift-330514.internal>
This commit is contained in:
@@ -650,4 +650,4 @@ def cast_overflow_tensors(
|
||||
if tensors.isinf().any() or tensors.isnan().any():
|
||||
clamp_value = torch.finfo(tensors.dtype).max - offset
|
||||
tensors = torch.clamp(tensors, min=-clamp_value, max=clamp_value)
|
||||
return tensors
|
||||
return tensors
|
||||
Reference in New Issue
Block a user