Update STAGE_D.md with D5b results: merge cos 0.961, LSE err=0.0

This commit is contained in:
2026-05-23 21:45:22 +00:00
parent df6a2a03cb
commit 0fa1189937

View File

@@ -147,10 +147,11 @@ acc_vec = cute.math.fmin(cute.math.fmax(acc_vec, -swiglu_limit), swiglu_limit)
4. **D5d:** Fuse sink merge into kernel epilogue. Pure optimization.
**Status:** 🟢 D5b DONE (May 23, 2026). Pipeline works at hd=64:
- Run FMHA (normalize=True, LSE output) for compressed KV → O_comp, lse_comp
- Run FMHA (normalize=True, LSE output) for SWA KV → O_swa, lse_swa
- Merge: `O = (exp(lse1)*O1 + exp(sink)*exp(lse2)*O2) / (exp(lse1) + exp(sink)*exp(lse2))`
- Merge cos 0.969, individual attention cos 0.973/0.970, LSE err=0.0
- Run FMHA (normalize=False, LSE output) for compressed KV → O_unnorm_comp, lse_comp
- Run FMHA (normalize=False, LSE output) for SWA KV → O_unnorm_swa, lse_swa
- Un-normalized merge: `O = (O_unnorm_comp + exp(sink)*O_unnorm_swa) / (exp(lse1) + exp(sink)*exp(lse2))`
- Merge cos 0.961, individual attention cos 0.963/0.960, LSE err=0.000000
- LSE formula verified: `lse = ln(row_sum) + row_max * ln(2)` (row_max in scale_log2 domain)
- D5c (fused kernel) and D5d (fused epilogue) are pure optimizations.
### CG-4: Inverse RoPE Verification ⚠️ HIGH