Update README

This commit is contained in:
Chenggang Zhao
2025-09-25 16:27:57 +08:00
parent 3f71de7aa9
commit 904b721731

View File

@@ -33,7 +33,7 @@ Despite its lightweight design, DeepGEMM's performance matches or exceeds expert
- [ ] Larger TMA multicast size for some shapes
- [x] MMA template refactor with CUTLASS
- [x] Remove shape limitations on N and K
- [ ] BF16 kernels
- [x] BF16 kernels
- [ ] Split/stream-k optimizations
- [ ] Ampere kernels
- [ ] Polish docs