Integration SM100 FlashInfer fused allreduce RMSNorm (#20691)

Signed-off-by: ilmarkov <imarkov@redhat.com>
Co-authored-by: ilmarkov <imarkov@redhat.com>
This commit is contained in:
Ilya Markov
2025-07-12 03:58:15 +02:00
committed by GitHub
parent 7b828e30d5
commit fc0f41d10a
4 changed files with 514 additions and 6 deletions

View File

@@ -3962,6 +3962,10 @@ class PassConfig:
"""Whether to enable sequence parallelism."""
enable_async_tp: bool = False
"""Whether to enable async TP."""
enable_fi_allreduce_fusion: bool = False
"""Whether to enable flashinfer allreduce fusion."""
fi_allreduce_fusion_max_token_num: int = 1024
"""Max number of tokens to used in flashinfer allreduce fusion."""
# TODO(luka) better pass enabling system.