FIX: (None,0,None,0) pre-slice keeps KV tile axis (mode 2) free
tBgK has 4 modes: (V_grouped, ?, KV_tiles, ?). Mode 2 is the GMEM tile dim.
Old (None,None,0,0) kept modes 0,1 free → mode 2 collapsed to 0 → always tile 0.
8-None no-op slice FAILS — tensor is 4-mode, not 8-mode, at JIT level.
Fix: (None,0,None,0) keeps modes 0,2 free → 2D tensor.
Then tBgK[None, kt] indexes the surviving KV_tiles dim.
Matches CUTLASS reference FMHA pattern:
tKgK = tKgK_kdl[None, None, 0, batch]
cute.copy(tma_k, tKgK[None, kv_coord], ...)