Files
DeepGEMM/csrc/jit_kernels
biondizzle c0850a6859 Fix weight TMA descriptors: packed E2M1 needs K/2, block_k/2, swizzle/2
Weights are packed E2M1 (2 per byte) but TMA descriptors were using
unpacked dimensions — K-dim in elements instead of bytes, 128B swizzle
instead of 64B, full block_k instead of block_k/2. This caused OOB
reads and swizzle mismatch with the UMMA descriptor, producing
illegal instruction traps.
2026-05-12 06:51:39 +00:00
..