Update STATUS.md + MEMORY.md: single-shot inference verified

This commit is contained in:
2026-05-30 22:59:27 +00:00
parent 11c010e567
commit aac0fa1f08

View File

@@ -23,11 +23,14 @@
## Stage E Checklist (from ROADMAP/NEXT_PRIORITIES_PART_2)
- [x] **E1:** Wire `LayerCacheHandle``gather_compressed_kv`, `gather_all_compressed_kv`, `gather_swa_kv`, `num_query_heads`, `head_dim`
- [x] **E2:** End-to-end smoke test through one full layer ✅ (SWA + CSA + HCA)
- [x] **E3:** Top-level `model/dsv4.py`
- [x] **E4:** Delete `torch.cuda.synchronize()` from fast path
- [x] **E5:** Fold batch loop into kernel grid ✅ (single launch for batched decode)
- [x] **E1:** Wire `LayerCacheHandle` → gather methods
- [x] **E2:** E2E smoke tests (SWA + CSA + HCA)
- [x] **E3:** DSV4Model class
- [x] **E4:** Removed `torch.cuda.synchronize`
- [x] **E5:** Batch loop folded into kernel grid ✅
- [x] **Single-shot inference:** Full 61-layer pipeline runs on B200 ✅
- FMHA kernel verified: hd=512, 128 query heads, all layers correct
- Garbage output expected without mHC/MoE/KV-cache (architecture gaps, not kernel)
- [ ] **E6:** FP4 output fusion for FMHA → wo_a
- [ ] **E7:** Lightning indexer FP4 tensor-core scoring
- [ ] **E8:** Multi-CTA grid for prefill