Files
nvfp4-megamoe-kernel/tests/unit
biondizzle a63f452c86 Fix softmax loop: use self.n_kv_tiles not cute.size(gK, mode=[3])
cute.size(gK, mode=[3]) returns 1 for ALL n values — mode 3 is batch,
not KV tiles. self.n_kv_tiles = s_k // 128 is the correct Python int.
This is why softmax only processed kt=0 for all n.
2026-05-23 00:30:49 +00:00
..