[Bugfix] FlashInfer MXINT4 MoE crashes, missing do_finalize (#39315)

Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>
Signed-off-by: Benjamin Chislett <chislett.ben@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
This commit is contained in:
Benjamin Chislett
2026-04-08 20:36:33 -04:00
committed by GitHub
parent ba4a78eb5d
commit 8332078cfd
2 changed files with 98 additions and 1 deletions

View File

@@ -259,8 +259,12 @@ def flashinfer_trtllm_mxint4_moe(
routed_scaling_factor=None,
routing_method_type=routing_method_type,
enable_pdl=None,
do_finalize=True,
output=None,
tune_max_num_tokens=8192,
).to(x.dtype)
)
if isinstance(out, (tuple, list)):
out = out[0]
out = out.to(x.dtype)
return out