Chenggang Zhao
|
7f2a703ed5
|
[Public release 26/04] Introducing Mega MoE, FP4 Indexer and other features/fixes (#304)
* Merge with private repo
* Update README
* Update README
* Update README
* Add PyTorch requirements
* Fix sync scopes for MQA logits (#256)
* Update README
|
2026-04-17 09:45:14 +08:00 |
|
Zhean Xu
|
0f5f266202
|
Multiple updates and refactorings (#280)
|
2026-01-16 17:06:52 +08:00 |
|
Ray Wang
|
38f8ef73a4
|
Multiple updates and refactorings (#231)
|
2025-11-21 17:49:47 +08:00 |
|
Chenggang Zhao
|
ea9c5d9270
|
Use driver API
|
2025-08-28 09:40:49 +08:00 |
|
Ray Wang
|
f85ec649d7
|
Make various updates and fixes: (#164)
- Add BF16 support for SM90 and SM100
- Refactor Python APIs
- Other fixes and code refactoring
|
2025-08-15 18:32:35 +08:00 |
|
Ray Wang
|
d9c363f86f
|
Make various updates and fixes:
- Add support for legacy CUDA versions; now compatible with CUDA 12.3 and newer
- Add support for NVRTC compilation
- Other fixes and code refactoring
|
2025-08-02 19:52:22 -07:00 |
|