[Bugfix] FlashInfer MXINT4 MoE crashes, missing do_finalize (#39315)
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com> Signed-off-by: Benjamin Chislett <chislett.ben@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
This commit is contained in:
committed by
GitHub
parent
ba4a78eb5d
commit
8332078cfd
@@ -259,8 +259,12 @@ def flashinfer_trtllm_mxint4_moe(
|
||||
routed_scaling_factor=None,
|
||||
routing_method_type=routing_method_type,
|
||||
enable_pdl=None,
|
||||
do_finalize=True,
|
||||
output=None,
|
||||
tune_max_num_tokens=8192,
|
||||
).to(x.dtype)
|
||||
)
|
||||
if isinstance(out, (tuple, list)):
|
||||
out = out[0]
|
||||
out = out.to(x.dtype)
|
||||
|
||||
return out
|
||||
|
||||
Reference in New Issue
Block a user