Wentao Ye
|
77c09e1130
|
[Refactor] Remove align block size logic in moe_permute (#33449)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2026-02-06 10:57:06 -08:00 |
|
Wentao Ye
|
6c97b9b9b6
|
[Perf] Only clone when needed for moe_permute (#32273)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2026-01-20 11:34:39 -05:00 |
|
Caleb_Du
|
57c22e57f9
|
Fix CUDA permute/unpermute for use with DeepGemm Moe (#17934)
Signed-off-by: Caleb_Du <Caleb_Du@zju.edu.cn>
|
2025-07-27 07:08:00 -07:00 |
|
Tyler Michael Smith
|
6e588da0f4
|
[Build/CI] Fix CUDA 11.8 build (#17679)
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
Signed-off-by: Tyler Michael Smith <tysmith@redhat.com>
Co-authored-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
|
2025-05-22 12:13:54 -07:00 |
|
Caleb_Du
|
3e887d2e0c
|
permute/unpermute kernel for moe optimization (#14568)
Signed-off-by: Caleb_Du <Caleb_Du@zju.edu.cn>
|
2025-05-02 11:31:55 -07:00 |
|