Roberto L. Castro
|
afdce12c89
|
[Perf][Kernel] Add faster topKperRow decode kernel for DeepSeek-V3.2 sparse attention (#33680)
Signed-off-by: LopezCastroRoberto <rocastro@redhat.com>
Signed-off-by: Roberto L. Castro <38211239+LopezCastroRoberto@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
|
2026-02-10 10:29:52 -05:00 |
|
Xin Yang
|
79028d4388
|
[Perf] Disable clean_logits in deepgemm fp8_mqa_logits kernel (#33568)
|
2026-02-05 20:34:00 -05:00 |
|
Daniel Cámpora
|
184076c3fe
|
[DeepSeek v3.2] Make top-k work for any logit values. (#27568)
Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2025-12-08 06:55:58 -08:00 |
|
Lain
|
09a7e6f617
|
[Deepseek v3.2] Remove extra logics in indexer (#26465)
Signed-off-by: Siyuan Fu <siyuanf@nvidia.com>
Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>
Signed-off-by: Lain <siyuanf@nvidia.com>
Co-authored-by: Daniel Campora <961215+dcampora@users.noreply.github.com>
|
2025-10-21 23:34:03 +00:00 |
|
Daniel Cámpora
|
80e9452984
|
[Deepseek v3.2] Optimize top_k_per_row (#26763)
Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>
|
2025-10-21 08:30:07 +00:00 |
|
Daniel Cámpora
|
0e67102d93
|
Added test_top_k_per_row to test-pipeline.yaml. (#26569)
Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>
|
2025-10-10 10:48:33 -04:00 |
|
Daniel Cámpora
|
e1098ced95
|
Add topk logits torch op for DS3.2. (#25945)
Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>
Signed-off-by: Daniel Cámpora <961215+dcampora@users.noreply.github.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
|
2025-10-07 10:07:32 +00:00 |
|