Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.
6c0a6cf50b
Root cause FOUND: V SMEM only holds 1 K-tile (2048 BF16), but PV MMA iterates 8 K-phases. For non-(128,128) PV, most K-phases read wrong V data. Zero-padded V works because V is (128,128) covering all 8 K-phases. FMHA interleaves QK+PV per KV-tile to avoid this.
80da5c51a6
Key finding: PV A-fragment layout is IDENTICAL for (128,128)/(128,32)/(128,16) PV. Bug is NOT TMEM alias. cta_tile_shape_mnk wrong for non-(128,128) PV. V SMEM and O C-fragment sizes look correct. Debugging V/epilogue paths.
73cb3a3277
Debugging TMEM alias for (128,16) PV: zero output confirmed, PV reads from wrong TMEM columns. Need to align softmax P write with PV A-fragment layout.