Files
nvfp4-megamoe-kernel/dsv4/kernels
biondizzle fd7c0cb773 P6: Fix TMA store — use bulk_group (commit+wait) not mbarrier
TMA store uses cp.async.bulk.tensor.2d.global.shared::cta.tile.bulk_group
NOT mbarrier::complete_tx::bytes. Completion tracked via:
  - cp.async.bulk.commit_group (after issuing stores)
  - cp.async.bulk.wait_group.read 0 (wait for all groups)

Removed sMbarStore from SMEM allocations (no longer needed).
2026-05-30 16:57:35 +00:00
..