Files
nvfp4-megamoe-kernel/cutedsl/shared_expert_pipeline.py
biondizzle 48386e34ad Fix torch.compile: use custom autograd Function instead of @torch.compiler.disable
torch.compile fullgraph mode can't handle @torch.compiler.disable (skips
the function and refuses to compile). Custom autograd Functions are treated
as opaque ops by torch.compile — they execute eagerly without the compiler
trying to trace into CuTeDSL internals (JIT, Path.cwd, etc).
2026-05-18 21:38:28 +00:00

12 KiB