[torch.compile] Add an option to force-enable the MOE cold start optimization (#33735)

Signed-off-by: Richard Zou <zou3519@gmail.com>
This commit is contained in:
Richard Zou
2026-02-08 13:42:56 -05:00
committed by GitHub
parent a263aa6140
commit 4df841fe75
3 changed files with 18 additions and 12 deletions

View File

@@ -593,7 +593,7 @@ class CompilationConfig:
local_cache_dir: str = field(default=None, init=False) # type: ignore
"""local cache dir for each rank"""
fast_moe_cold_start = True
fast_moe_cold_start: bool | None = None
"""Optimization for fast MOE cold start.
This is a bit of a hack that assumes that:
@@ -604,8 +604,14 @@ class CompilationConfig:
When the above two conditions hold, this option greatly decreases cold start
time for MOE models.
If the above two conditions don't hold, then this option will lead to silent
incorrectness. The only condition in which this doesn't hold is speculative
The options are:
- True: optimization is always on
- False: optimization is always off
- None: optimization is on usually but off for speculative decoding
If conditions 1&2 don't hold then this option will lead to silent
incorrectness.
The only condition in which this doesn't hold is speculative
decoding, where there is a draft model that may have MOEs in them.
NB: We're working on a longer-term solution that doesn't need these assumptions.