Files
nvfp4-megamoe-kernel/dsv4
biondizzle 91c370360a Fix CuTeDSL from_dlpack device mismatch in CUDA graph capture (v3)
Patch torch.cuda.current_device to return the tensor's device index
during from_dlpack calls inside CUDA graph capture. This bypasses the
device check in __dlpack__ without changing the CUDA stream (which
caused 'Capture must end on the same stream' in v1) and without
triggering a cross-device copy (which caused 'Cannot copy between
CPU and CUDA tensors' in v2).
2026-06-03 21:09:12 +00:00
..