Files
nvfp4-megamoe-kernel/tests/unit
biondizzle c2ff8e072e Fix ALL loops: use self.n_kv_tiles everywhere
The MMA loop (cutlass.range) and MMA consumer loop (range) also used
cute.size(gK, mode=[3]) which returns 1 for all n. Fixed all 3 loops:
1. TMA load loop (cutlass.range, line 215)
2. MMA consumer loop (range, line 231)
3. Softmax loop (range, line 324)

This was causing the deadlock — MMA only produced S[0] while softmax
waited for S[1].
2026-05-23 00:33:38 +00:00
..