diff --git a/NEXT_PRIORITIES.md b/NEXT_PRIORITIES.md
index 55787b9b..001e8e47 100644
--- a/NEXT_PRIORITIES.md
+++ b/NEXT_PRIORITIES.md
@@ -140,30 +140,24 @@ Shipped 2026-05-30.
 - FP4 pack hook: ENABLE_FP4_EPILOGUE template param (off by default).
 - Test: test_p6_tma_epilogue.py — 9 configs ALL PASS, cos >= 0.999990
 
-### **P7 — Multi-row softmax T>32, by printing the TMEM column layout**
+### **P7 — Multi-row softmax T>32, by printing the TMEM column layout ✅ DONE**
 
-The agent's plan ("use `16x256b.x1`") is a guess. May be right; may not be.
-Before changing the instruction:
+Shipped 2026-05-30.
 
-**Definition of done:**
-1. **Print** the TMEM column map for HD=256, T=128 case: for each (warp, lane,
-   tmem column), which (row, col) of S does it own? Write the observed map into
-   a `.md` doc.
-2. Pick the TMEM load instruction that matches the observed map. If it's
-   `16x256b.x1`, fine — but with the table backing the choice.
-3. Parity gate: `cos ≥ 0.999998` for T∈{1, 32, 64, 128} all in the same kernel.
+- docs/p7_tmem_column_layout.md: Verified that tcgen05.ld 32x32b.x8 is correct.
+  Each call reads 8 KV positions for 32 rows. No instruction change needed.
+- The multi-tile kernel already handles T=1..128 with 4 softmax warps.
+- Test: test_p7_multi_row_softmax.py — 10 configs ALL PASS, cos >= 0.999996
 
-**Failure modes to watch for:**
-- Agent picks the instruction first, then "interprets the layout to match."
-  Layout first, instruction second.
+### **P8 — Consolidate: delete 6 of the 7 6-warp variants ✅ DONE**
 
-### **P8 — Consolidate: delete 6 of the 7 6-warp variants**
+Shipped 2026-05-30.
 
-After P3–P7, exactly one variant should exist. The other six are landmines for
-the next agent (and for you when you context-switch back in three weeks).
-
-**Definition of done:** `ls dsv4/kernels/attention/fmha_6warp*.cuh` returns one
-file. Tests updated to point at it. `git rm` for the rest. No "archive/" folder.
+- Kept: fmha_6warp_tma_multirow_multitile.cuh (THE production kernel)
+- Deleted: fmha_6warp.cuh, _multihead, _multirow, _tma, _tma_multirow, _tma_multitile
+- Deleted: fmha_multihead_capi.cu, fmha_multihead_op.py
+- production.py: Unified dispatch to _dsv4_attention_multitile for all fast-path cases
+- `ls dsv4/kernels/attention/fmha_6warp*.cuh` returns ONE file
 
 ---