Update D2 status in README
This commit is contained in:
@@ -146,7 +146,7 @@ Summary
|
||||
| B | ✅ COMPLETE | QK → identity softmax → P@V pipeline (TMEM alias, KV-tile interleaving) |
|
||||
| C | ✅ COMPLETE | Real online softmax. Kernel outputs un-norm O + LSE (no TMEM round-trip). Migrated to `dsv4/kernels/attention/fmha.py` as `FmhaKernel`. |
|
||||
| D1 | 🟡 hd≤256 DONE | Parameterized HEAD_DIM. qk_mma_tiler fix (hd=64/128/256 cos 0.999998). hd=512 SMEM fits but MLIR compilation hangs (>3hr). External k_sub merge proven impossible. |
|
||||
| D2 | TODO | Multi-query grid with head packing (128 Q heads, MQA) |
|
||||
| D2 | 🟡 Per-head DONE | Multi-query grid. Per-head launch works (cos 0.999998, n_h=64 hd=64). Multi-CTA grid deferred (requires tma_partition refactor). |
|
||||
| D3 | TODO | SWA sequence length mask (swa_lens per batch) |
|
||||
| D4 | TODO | Causal mask on SWA branch only |
|
||||
| D5 | 🟢 D5a+D5b DONE | D5a: normalize flag + LSE output (err=0.0). D5b: Python SWA+sink merge (cos 0.961). D5c/D5d: fused kernel merge TODO. |
|
||||
|
||||
Reference in New Issue
Block a user