Commit Graph

1 Commits

Author SHA1 Message Date
0dc6fe4a7d Stage B progress: PV works for square (128,128), broken for (128,64)
- Bug 1 (V MN-major): Fix applied
- Bug 2 (softmax packing): Confirmed correct (V=I test: cosine 1.0)
- Bug 3 (ACCUMULATE): Fix applied (first PV must overwrite, not accumulate)
- Bug 4 (CURRENT): PV MMA broken for non-square output
  - (128,128) PV with random V: cosine 0.999999 
  - (128,64) PV with MN-major V: cosine ~0.01 
  - Softmax packing, layout aliasing, pipeline ordering all verified correct
  - Root cause unknown — likely epilogue/V layout/MMA tiler issue

Added test_pv_diag.py (V=I and random V, 128x128 output — PASS)
Added test_layout_compare.py (TMEM layout inspection)
Added test_inspect_types.py (TMEM pointer arithmetic verification)
Updated test_mma_si_pv.py with head_dim param, pv_mma_tiler_mn fix, ACCUMULATE fix
Updated READMEs with current state
2026-05-21 04:40:28 +00:00