🎉 Mark D1.3 as SOLVED! SMEM-P rank mismatch fixed, enables hd>64 support
This commit is contained in:
17
STAGE_D.md
17
STAGE_D.md
@@ -1,5 +1,22 @@
|
||||
# Stage D — Parameterized FMHA for DSV4
|
||||
|
||||
|
||||
## 🎉 VICTORY: D1.3 SOLVED! (2026-05-23)
|
||||
|
||||
**After intensive debugging, SMEM-P rank mismatch issue resolved!**
|
||||
|
||||
**Problem:** SMEM-P copy failed with "Expected source and destination tensors to have the same rank, but got 5 and 3"
|
||||
|
||||
**Root Cause:** tensor used TMEM layout () with extra singleton modes, while SMEM copy expected QK C-fragment layout.
|
||||
|
||||
**Solution:** Create tensor viewing same data with QK C-fragment layout ():
|
||||
|
||||
|
||||
**Impact:** Enables hd>64 support (128, 256, 512). Multi-PV-tile works for hd=512 (2 tiles of 256 each).
|
||||
|
||||
**Status:** Kernel compiles and runs for all head dimensions. SMEM-P path enabled for hd>64.
|
||||
|
||||
|
||||
## ⚠️ IKEA INSTRUCTIONS — READ EVERY TIME BEFORE CODING
|
||||
|
||||
### The Workflow (DO NOT SKIP STEPS)
|
||||
|
||||
Reference in New Issue
Block a user