Commit Graph

5 Commits

Author SHA1 Message Date
Chenggang Zhao
7f2a703ed5 [Public release 26/04] Introducing Mega MoE, FP4 Indexer and other features/fixes (#304)
* Merge with private repo

* Update README

* Update README

* Update README

* Add PyTorch requirements

* Fix sync scopes for MQA logits (#256)

* Update README
2026-04-17 09:45:14 +08:00
Jun Jiang
6e74faad5c Upgrade to CUTLASS 4.2.1 (#203) 2025-10-09 09:09:22 +08:00
Rain Jiang
51d1e9cdd3 Support compilation with CUDA 13.0 (#174) 2025-08-27 09:30:08 +08:00
Ray Wang
9da4a23561 Add more GPU architectures support (#112)
* Add more GPU architectures support

* Update layout.py

* Optimize performance, Add SM90 support, Add 1D2D SM100 support

* Add fmtlib submodule at commit 553ec11

---------

Co-authored-by: fzyzcjy <5236035+fzyzcjy@users.noreply.github.com>
2025-07-18 11:32:22 +08:00
Chenggang Zhao
a6d97a1c1b Initial commit 2025-02-25 22:52:41 +08:00