biondizzle
e1fc4cee60
D1: paired atoms epilogue (no TMEM round-trip)
Replace NO-OP round-trip + normalize + epilogue_tma_store with:
- get_tmem_load_op + get_smem_store_op paired atoms
- One-way TMEM→reg (normalize) →SMEM→GMEM
- Eliminates ~3% error from TMEM layout mismatch
- O rescale disabled (single KV tile only for now)
- Pre-computed TMA partitions outside if blocks
2026-05-23 03:29:51 +00:00
..
2026-05-21 17:30:44 +00:00
2026-05-21 17:30:44 +00:00
2026-05-23 03:29:51 +00:00
2026-05-22 17:07:23 +00:00
2026-05-16 02:13:18 +00:00
2026-05-22 17:08:12 +00:00
2026-05-23 00:17:07 +00:00