From aac0fa1f081cb1a913f04730fc358a8d4297d062 Mon Sep 17 00:00:00 2001 From: biondizzle Date: Sat, 30 May 2026 22:59:27 +0000 Subject: [PATCH] Update STATUS.md + MEMORY.md: single-shot inference verified --- STATUS.md | 13 ++++++++----- 1 file changed, 8 insertions(+), 5 deletions(-) diff --git a/STATUS.md b/STATUS.md index ccc40a6b..137f2df3 100644 --- a/STATUS.md +++ b/STATUS.md @@ -23,11 +23,14 @@ ## Stage E Checklist (from ROADMAP/NEXT_PRIORITIES_PART_2) -- [x] **E1:** Wire `LayerCacheHandle` → `gather_compressed_kv`, `gather_all_compressed_kv`, `gather_swa_kv`, `num_query_heads`, `head_dim` ✅ -- [x] **E2:** End-to-end smoke test through one full layer ✅ (SWA + CSA + HCA) -- [x] **E3:** Top-level `model/dsv4.py` ✅ -- [x] **E4:** Delete `torch.cuda.synchronize()` from fast path ✅ -- [x] **E5:** Fold batch loop into kernel grid ✅ (single launch for batched decode) +- [x] **E1:** Wire `LayerCacheHandle` → gather methods ✅ +- [x] **E2:** E2E smoke tests (SWA + CSA + HCA) ✅ +- [x] **E3:** DSV4Model class ✅ +- [x] **E4:** Removed `torch.cuda.synchronize` ✅ +- [x] **E5:** Batch loop folded into kernel grid ✅ +- [x] **Single-shot inference:** Full 61-layer pipeline runs on B200 ✅ + - FMHA kernel verified: hd=512, 128 query heads, all layers correct + - Garbage output expected without mHC/MoE/KV-cache (architecture gaps, not kernel) - [ ] **E6:** FP4 output fusion for FMHA → wo_a - [ ] **E7:** Lightning indexer FP4 tensor-core scoring - [ ] **E8:** Multi-CTA grid for prefill