Files
nvfp4-megamoe-kernel/cutedsl/bridge.py
biondizzle a94011ec92 Fix torch.compile crash: remove threading.Lock from LUT cache path
The _NVFP4_STEP_LUT_LOCK caused 'Unsupported context manager' under
torch.compile/cudagraph. LUT is now pre-populated during warmup so
the fast path (cache hit) never hits a lock.

Also removed all init/warmup debug prints from CuTeDSL kernels.
2026-05-18 20:54:55 +00:00

13 KiB