Ray Wang
|
38f8ef73a4
|
Multiple updates and refactorings (#231)
|
2025-11-21 17:49:47 +08:00 |
|
oliver könig
|
9f196058ae
|
chore: Build and store bdist wheels (#181)
* build: Minor tweeks for wheel build
Signed-off-by: oliver könig <okoenig@nvidia.com>
* ci: Workflows for wheel build
Signed-off-by: oliver könig <okoenig@nvidia.com>
* fix
Signed-off-by: oliver könig <okoenig@nvidia.com>
* fix
Signed-off-by: oliver könig <okoenig@nvidia.com>
* build: Add CachedWheel
Signed-off-by: oliver könig <okoenig@nvidia.com>
* add version to init
Signed-off-by: oliver könig <okoenig@nvidia.com>
* revert
Signed-off-by: oliver könig <okoenig@nvidia.com>
* revert
Signed-off-by: oliver könig <okoenig@nvidia.com>
* revert
Signed-off-by: oliver könig <okoenig@nvidia.com>
* v2
Signed-off-by: oliver könig <okoenig@nvidia.com>
* update
Signed-off-by: oliver könig <okoenig@nvidia.com>
* test
Signed-off-by: oliver könig <okoenig@nvidia.com>
* from packaging.version import parse
Signed-off-by: oliver könig <okoenig@nvidia.com>
* local version
Signed-off-by: oliver könig <okoenig@nvidia.com>
* remove file
Signed-off-by: oliver könig <okoenig@nvidia.com>
* revert
Signed-off-by: oliver könig <okoenig@nvidia.com>
* Updates and lint
* revert missing cudaextension args
Signed-off-by: oliver könig <okoenig@nvidia.com>
* Add timeout
* fix DG settings
Signed-off-by: oliver könig <okoenig@nvidia.com>
* DG_USE_LOCAL_VERSION
Signed-off-by: oliver könig <okoenig@nvidia.com>
* Update version
* Detect local changes
* Minor fix
* Revert CUTLASS
* Unify options
---------
Signed-off-by: oliver könig <okoenig@nvidia.com>
Co-authored-by: Chenggang Zhao <chenggangz@deepseek.com>
|
2025-10-10 18:23:40 +08:00 |
|
Chenggang Zhao
|
80ceeb2c76
|
Add SM90 kernels (#200)
|
2025-09-29 17:00:23 +08:00 |
|
Ray Wang
|
3f71de7aa9
|
Make various updates and fixes (#198)
|
2025-09-25 16:19:07 +08:00 |
|
Ray Wang
|
f85ec649d7
|
Make various updates and fixes: (#164)
- Add BF16 support for SM90 and SM100
- Refactor Python APIs
- Other fixes and code refactoring
|
2025-08-15 18:32:35 +08:00 |
|
Ray Wang
|
d9c363f86f
|
Make various updates and fixes:
- Add support for legacy CUDA versions; now compatible with CUDA 12.3 and newer
- Add support for NVRTC compilation
- Other fixes and code refactoring
|
2025-08-02 19:52:22 -07:00 |
|
Ray Wang
|
9da4a23561
|
Add more GPU architectures support (#112)
* Add more GPU architectures support
* Update layout.py
* Optimize performance, Add SM90 support, Add 1D2D SM100 support
* Add fmtlib submodule at commit 553ec11
---------
Co-authored-by: fzyzcjy <5236035+fzyzcjy@users.noreply.github.com>
|
2025-07-18 11:32:22 +08:00 |
|
Zhean Xu
|
04278f6dee
|
Weight gradient kernels for dense and MoE models (#95)
* Init weight gradient kernels.
* Support unaligned n,k and gmem stride
* Update docs
* Several cleanups
* Remove restrictions on N
* Add stride(0) assertions
---------
Co-authored-by: Chenggang Zhao <chenggangz@deepseek.com>
|
2025-05-14 14:47:58 +08:00 |
|
AcraeaTerpsicore
|
96b31fd6bb
|
fix typo
|
2025-02-26 18:37:22 +08:00 |
|
Chenggang Zhao
|
a6d97a1c1b
|
Initial commit
|
2025-02-25 22:52:41 +08:00 |
|