Files
nvfp4-megamoe-kernel/cutedsl
biondizzle 0a5cfe0433 add kernel compile caching — compile once, invoke on subsequent calls
First call: cute.compile() with real tensors (warmup).
Subsequent calls: just invoke compiled() with new CuTe views.
No cute.compile() in the forward path = cudagraph-safe.
2026-05-16 20:45:46 +00:00
..