biondizzle
f94978ffa7
D1.5: Prepare for SMEM accumulator implementation
- Added epilogue utility imports (transform_partitioned_tensor_layout, etc.)
- Re-added pv_done_bar for SMEM accumulator synchronization
- Backed up current fmha.py as fmha_backup_v2.py
- SMEM accumulator approach: one-way TMEM→REGS→SMEM per kt, accumulate in FP32 SMEM
2026-05-26 21:00:41 +00:00
..
2026-05-26 21:00:41 +00:00
2026-05-22 00:08:38 +00:00
2026-05-21 17:30:44 +00:00
2026-05-25 16:21:44 +00:00
2026-05-21 17:30:44 +00:00
2026-05-23 06:32:54 +00:00
2026-05-22 01:20:39 +00:00
2026-05-21 22:04:20 +00:00
2026-05-21 17:30:44 +00:00