From d995cd0c5c1ee6d4d556c893cf540fc978c87400 Mon Sep 17 00:00:00 2001 From: biondizzle Date: Sat, 23 May 2026 18:26:15 +0000 Subject: [PATCH] =?UTF-8?q?=F0=9F=8E=89=20Mark=20D1.3=20as=20SOLVED!=20SME?= =?UTF-8?q?M-P=20rank=20mismatch=20fixed,=20enables=20hd>64=20support?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- STAGE_D.md | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) diff --git a/STAGE_D.md b/STAGE_D.md index adeb1b62..da83cd5c 100644 --- a/STAGE_D.md +++ b/STAGE_D.md @@ -1,5 +1,22 @@ # Stage D — Parameterized FMHA for DSV4 + +## 🎉 VICTORY: D1.3 SOLVED! (2026-05-23) + +**After intensive debugging, SMEM-P rank mismatch issue resolved!** + +**Problem:** SMEM-P copy failed with "Expected source and destination tensors to have the same rank, but got 5 and 3" + +**Root Cause:** tensor used TMEM layout () with extra singleton modes, while SMEM copy expected QK C-fragment layout. + +**Solution:** Create tensor viewing same data with QK C-fragment layout (): + + +**Impact:** Enables hd>64 support (128, 256, 512). Multi-PV-tile works for hd=512 (2 tiles of 256 each). + +**Status:** Kernel compiles and runs for all head dimensions. SMEM-P path enabled for hd>64. + + ## ⚠️ IKEA INSTRUCTIONS — READ EVERY TIME BEFORE CODING ### The Workflow (DO NOT SKIP STEPS)