Michael Goin
|
6d4e27ce29
|
[Bugfix] Enforce DeepGEMM when using sparse_attn_indexer on CUDA (#34374)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2026-02-12 12:08:06 -08:00 |
|
Kunshang Ji
|
cb9574eb85
|
[XPU][9/N] clean up existing ipex code/doc (#34111)
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
|
2026-02-11 00:27:15 -08:00 |
|
Roberto L. Castro
|
afdce12c89
|
[Perf][Kernel] Add faster topKperRow decode kernel for DeepSeek-V3.2 sparse attention (#33680)
Signed-off-by: LopezCastroRoberto <rocastro@redhat.com>
Signed-off-by: Roberto L. Castro <38211239+LopezCastroRoberto@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
|
2026-02-10 10:29:52 -05:00 |
|
Xin Yang
|
79028d4388
|
[Perf] Disable clean_logits in deepgemm fp8_mqa_logits kernel (#33568)
|
2026-02-05 20:34:00 -05:00 |
|
Pleaplusone
|
6c20e89c02
|
[ROCm][Deepseekv3.2] Refactor Sparse Indexer as CustomOp (#29287)
Signed-off-by: ganyi <ygan@amd.com>
|
2026-01-21 23:16:30 +08:00 |
|