Files
nvfp4-megamoe-kernel/tests/unit
biondizzle 7aaf9ccbda FIX: keep GMEM iteration dimension FREE in TMA K/V partition slices
Root cause of multi-tile failure: (None,0,None,0) slice hardcodes the
GMEM tile dimension to 0, so TMA always loads from tile 0 regardless
of kvh.count. K from QK MMA has GMEM iter at mode 1, V from PV MMA
has it at mode 2 (different layouts: K,D,L vs D,K,L).

Fix follows CUTLASS FMHA reference:
- K: tBgK[(None,None,None,0)] + tBgK[(None, kvh.count, None)]
- V: tVgV[(None,0,None,0)] + tVgV[(None, kvh.count)]
2026-05-22 16:51:57 +00:00
..
2026-05-22 08:57:38 +00:00
2026-05-22 08:57:38 +00:00