From 8002b769c01c4667b63d3f9fed1d87d8599f298c Mon Sep 17 00:00:00 2001 From: Chenggang Zhao Date: Tue, 25 Mar 2025 18:13:24 +0800 Subject: [PATCH] Update README --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index d1b0817..6266863 100644 --- a/README.md +++ b/README.md @@ -152,7 +152,7 @@ The [Tensor Memory Accelerator](https://docs.nvidia.com/cuda/hopper-tuning-guide - TMA load for LHS, LHS scaling factors, and RHS matrices - TMA store for the output matrix -- TMA multicast (exclusive to the LHS matrix) +- TMA multicast (automatically decide LHS or RHS to broadcast) - TMA descriptor prefetching #### Common detail optimizations