Carl Y
|
3bc2734dd0
|
[Kernel] Fuse FP8 output quantization into merge_attn_states (#36518)
Signed-off-by: Carl You <4531192+carlyou@users.noreply.github.com>
|
2026-04-03 01:47:04 +00:00 |
|
Olya Kozlova
|
598190aac3
|
[fix] Remove trtllm ragged mla prefills (#36540)
Signed-off-by: Olya Kozlova <okozlova@nvidia.com>
|
2026-03-31 12:30:27 -07:00 |
|
Pleaplusone
|
d9d342d214
|
[Performance][MLA][ROCm] Remove redundant D2D copy in deepseek (#27457)
Signed-off-by: ganyi <ygan@amd.com>
|
2025-11-26 12:45:28 +08:00 |
|
courage17340
|
981cadb35c
|
[Bugfix][Kernel] fix merge attn states when both prefix and suffix are empty (#28181)
Signed-off-by: courage17340 <courage17340@163.com>
|
2025-11-06 17:52:13 +08:00 |
|
Lucas Wilkinson
|
ce75efeecb
|
[BugFix] FA2 MLA Accuracy Issue (#18807)
Signed-off-by: LucasWilkinson <lwilkinson@neuralmagic.com>
|
2025-05-28 08:59:39 +00:00 |
|
DefTruth
|
e82ee40de3
|
[Bugfix][Kernel] fix potential cuda graph broken for merge_attn_states kernel (#16693)
Signed-off-by: DefTruth <qiustudent_r@163.com>
|
2025-04-16 03:31:39 -07:00 |
|
DefTruth
|
e9528f6dc6
|
[Kernel] support merge_attn_states CUDA kernel, 3x speedup (#16173)
Signed-off-by: DefTruth <qiustudent_r@163.com>
|
2025-04-11 06:50:50 -06:00 |
|