Kept: fmha_6warp_tma_multirow_multitile.cuh (production kernel) Deleted: fmha_6warp.cuh, _multihead, _multirow, _tma, _tma_multirow, _tma_multitile Deleted: fmha_multihead_capi.cu, fmha_multihead_op.py production.py: Removed _dsv4_attention_fast_decode, unified dispatch to _dsv4_attention_multitile for all fast-path cases.