biondizzle
e1fc4cee60
D1: paired atoms epilogue (no TMEM round-trip)
Replace NO-OP round-trip + normalize + epilogue_tma_store with:
- get_tmem_load_op + get_smem_store_op paired atoms
- One-way TMEM→reg (normalize) →SMEM→GMEM
- Eliminates ~3% error from TMEM layout mismatch
- O rescale disabled (single KV tile only for now)
- Pre-computed TMA partitions outside if blocks
2026-05-23 03:29:51 +00:00
..
2026-05-22 00:25:47 +00:00
2026-05-23 03:29:51 +00:00
2026-05-21 23:31:58 +00:00
2026-05-21 17:30:44 +00:00
2026-05-21 23:31:58 +00:00
2026-05-21 21:54:05 +00:00
2026-05-21 17:30:44 +00:00
2026-05-21 17:30:44 +00:00