Wei Zhao
|
b36adfa349
|
[Perf] Set Flashinfer sparse MLA as default backend for FP8 kv cache (#37252)
Signed-off-by: wzhao18 <wzhao18.sz@gmail.com>
|
2026-03-17 20:09:20 +00:00 |
|
Matthew Bonanni
|
93f3c8e531
|
[Misc] Add float16 to CacheDType (#37199)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
|
2026-03-16 13:24:48 -07:00 |
|
grimulkan
|
a1257fd1ea
|
[Kernel] Add FP8 KV cache support to Triton MLA decode attention (#34597)
Signed-off-by: grimulkan <grimulkan@gmail.com>
|
2026-03-12 08:32:34 -07:00 |
|
Wuxun Zhang
|
e584dce52b
|
Add XPU MLA Sparse backend for DeepSeek v3.2 (#33230)
Signed-off-by: Zhang, Wuxun <wuxun.zhang@intel.com>
|
2026-03-11 19:19:15 +08:00 |
|
JartX
|
a40ee486f2
|
[Bugfix] Add Multiple of 16 block_size to triton fallback on rocm Attention to support qwen3_5 (#35923)
Signed-off-by: JartX <sagformas@epdcenter.es>
Co-authored-by: akaratza <akaratza@amd.com>
Co-authored-by: TJian <tunjian.tan@embeddedllm.com>
|
2026-03-11 07:45:57 +00:00 |
|
Andreas Karatzas
|
c174d54f86
|
[ROCm][CI] Fix ROCm attention backend validation for head sizes, block sizes, and compute capability checks (#36292)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-09 12:02:41 -05:00 |
|
Harry Mellor
|
a0f44bb616
|
Allow markdownlint to run locally (#36398)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-03-08 20:05:24 -07:00 |
|
Wei Zhao
|
379689d533
|
[Perf] Support FP8 KV cache for Flashinfer MLA Sparse (#35891)
|
2026-03-07 13:51:54 -08:00 |
|
Andreas Karatzas
|
807d680337
|
[ROCm][CI] Fix tool use test stability - disable skinny GEMM, prefix caching, eliminate batch variance (#35553)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-06 15:15:12 +08:00 |
|
Rohan Potdar
|
c5362c739f
|
Reenable features for ROCm attention backends (#36185)
Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>
|
2026-03-05 20:21:06 -08:00 |
|
Sage Moore
|
8c760b6ab6
|
[ROCm] Refactor ROCm attention backend selection logic (#35246)
Signed-off-by: Sage Moore <sage@neuralmagic.com>
|
2026-03-05 10:51:26 -06:00 |
|
Lucas Wilkinson
|
8b5014d3dd
|
[Attention] FA4 integration (#32974)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
Co-authored-by: Matthew Bonanni <mbonanni@redhat.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
|
2026-03-01 23:44:57 +00:00 |
|
Micah Williamson
|
0edf101d2b
|
[ROCm] Add stablelm Head Size 80 To Supported Head Sizes For ROCM_ATTN (#35527)
Signed-off-by: Micah Williamson <micah.williamson@amd.com>
|
2026-02-28 12:16:34 +08:00 |
|
Gregory Shtrasberg
|
9fa6c68fa6
|
[ROCm] Enabling encoder and encoder-decoder on ROCm and AITER unified backends (#35334)
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
|
2026-02-27 21:32:55 +00:00 |
|
Matthew Bonanni
|
f2c47886fd
|
[Attention] Add FlashInfer Sparse MLA backend (#33451)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Co-authored-by: Lucas Wilkinson <lwilkins@redhat.com>
Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
|
2026-02-12 17:21:54 +00:00 |
|
Michael Goin
|
5e75a14a66
|
[Doc] Add DCP support to attention backend doc (#33936)
|
2026-02-09 18:33:43 -05:00 |
|
jennyyyyzhen
|
527bcd14d4
|
[ROCM] Enable aiter attn backend for qwen3-next model (#32492)
Signed-off-by: jennyyyyzhen <yzhen@hmc.edu>
|
2026-01-31 17:03:57 +08:00 |
|
Matthew Bonanni
|
77c4f45c6c
|
[7/N][Attention][Docs] Add documentation for attention backends (#32477)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
|
2026-01-28 17:20:22 -05:00 |
|