Files
nvfp4-megamoe-kernel/dsv4/kernels
biondizzle f94978ffa7 D1.5: Prepare for SMEM accumulator implementation
- Added epilogue utility imports (transform_partitioned_tensor_layout, etc.)
- Re-added pv_done_bar for SMEM accumulator synchronization
- Backed up current fmha.py as fmha_backup_v2.py
- SMEM accumulator approach: one-way TMEM→REGS→SMEM per kt, accumulate in FP32 SMEM
2026-05-26 21:00:41 +00:00
..