* Merge with private repo * Update README * Update README * Update README * Add PyTorch requirements * Fix sync scopes for MQA logits (#256) * Update README
- Add BF16 support for SM90 and SM100 - Refactor Python APIs - Other fixes and code refactoring