Update STAGE_D.md: manual SMEM addressing blocked on layout mapping
This commit is contained in:
27
STAGE_D.md
27
STAGE_D.md
@@ -778,15 +778,22 @@ The following are real potential wins but go beyond what the V4 paper explicitly
|
||||
|
||||
**Decision:** Manual SMEM addressing it is. Abandon `make_tiled_copy_C` entirely.
|
||||
|
||||
**Approach:**
|
||||
1. Get thread's position in QK C-fragment partition
|
||||
2. Compute which P values this thread owns (range in QK C-fragment space)
|
||||
3. For each P value, compute destination SMEM address in PV A-operand layout
|
||||
4. Write P values to computed SMEM addresses
|
||||
**Status:** STUCK — Manual addressing harder than expected due to CuTeDSL JIT constraints.
|
||||
|
||||
**Implementation Plan:**
|
||||
- Use `cute.coord` to get thread's logical coordinates in QK C-fragment partition
|
||||
- Compute mapping: (thread_coord, element_idx) → SMEM_offset
|
||||
- Write via `sP[smem_offset] = p_value`
|
||||
**Problems Encountered:**
|
||||
1. `cute.coord` doesn't exist — can't get thread's logical coordinates
|
||||
2. Array indexing requires compile-time constants or vectorized loops
|
||||
3. Layouts are completely different:
|
||||
- TMEM P layout: `((128,128),1,1):((65536,1),0,0)`
|
||||
- SMEM P layout: `((128,16),1,(4,2),1):((64,1),0,(16,8192),0)`
|
||||
4. No clear mapping from TMEM coordinates to SMEM coordinates
|
||||
|
||||
**Expected Complexity:** Few hours. Need to understand QK C-fragment layout and PV A-operand SMEM layout coordinate systems.
|
||||
**Root Issue:** Manual layout conversion in CuTeDSL requires understanding coordinate systems and offset computation, which is complex without proper documentation/examples.
|
||||
|
||||
**Options:**
|
||||
1. Continue trying to implement manual conversion (high risk, time-consuming)
|
||||
2. Find existing example of layout conversion in codebase
|
||||
3. Ask for more specific guidance on coordinate mapping
|
||||
4. Try different approach: make PV read from TMEM with different layout
|
||||
|
||||
**Blocked:** Need coordinate mapping formula or example.
|
||||
Reference in New Issue
Block a user