Andreas Karatzas
|
43cc5138e5
|
[ROCm][CI] Fix cross-attention dispatch for encoder-decoder models (#38450)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-28 22:08:03 -07:00 |
|
jennyyyyzhen
|
a4cf9b22ba
|
[ROCM][Bugfix] Use correct stride in cp_mha_gather_cache_kernel for hybrid model (#37228) (#37228)
Signed-off-by: jennyyyyzhen <yzhen@hmc.edu>
Co-authored-by: yZhen <yZhen@fb.com>
|
2026-03-26 10:33:39 -07:00 |
|
Wentao Ye
|
c59a132f96
|
[V0 Deprecation] Refactor kv cache from list to element (#37487)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2026-03-23 20:10:11 -07:00 |
|
Chuan (Richard) Li
|
e99fb98867
|
[ROCm] Fix fused_moe_fake signature mismatch and other AITER bugs (#36100)
Signed-off-by: Li <chuali@amd.com>
|
2026-03-23 15:48:31 +08:00 |
|
Matthew Bonanni
|
93f3c8e531
|
[Misc] Add float16 to CacheDType (#37199)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
|
2026-03-16 13:24:48 -07:00 |
|
Rohan Potdar
|
a4ad9db541
|
Enable RoPE+KV cache fusion for ROCm AITER FA (non-shuffle layout) (#35786)
Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>
|
2026-03-13 07:33:22 +00:00 |
|
Mengtao (Martin) Yuan
|
1a9718085c
|
Fix CUDA graph decode capture crash in AITER FlashAttention (#36042)
Signed-off-by: Martin Yuan <myuan@meta.com>
Co-authored-by: Martin Yuan <myuan@meta.com>
|
2026-03-06 18:12:07 -08:00 |
|
Andreas Karatzas
|
807d680337
|
[ROCm][CI] Fix tool use test stability - disable skinny GEMM, prefix caching, eliminate batch variance (#35553)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-06 15:15:12 +08:00 |
|
Rohan Potdar
|
c5362c739f
|
Reenable features for ROCm attention backends (#36185)
Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>
|
2026-03-05 20:21:06 -08:00 |
|
Jiayi Yan
|
6a895197fa
|
[Bugfix][CI] fix typos (#34934)
Signed-off-by: 1195343015 <1195343015@qq.com>
Signed-off-by: Jiayi Yan <66017932+1195343015@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-03-05 17:05:46 +00:00 |
|
Sage Moore
|
8c760b6ab6
|
[ROCm] Refactor ROCm attention backend selection logic (#35246)
Signed-off-by: Sage Moore <sage@neuralmagic.com>
|
2026-03-05 10:51:26 -06:00 |
|
Kunshang Ji
|
8ad54a991b
|
[Platform] Add current_platform.num_compute_units interface (#35042)
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
Signed-off-by: Kunshang Ji <jikunshang95@gmail.com>
|
2026-02-24 22:22:49 -08:00 |
|
Rohan Potdar
|
2ff4e51152
|
[ROCm] AITER fused RoPE+KVCache (#33443)
Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>
Signed-off-by: charlifu <charlifu@amd.com>
Signed-off-by: Rohan Potdar <66227218+Rohan138@users.noreply.github.com>
Co-authored-by: charlifu <charlifu@amd.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Douglas Lehr <91553416+dllehr-amd@users.noreply.github.com>
|
2026-02-23 19:06:00 -08:00 |
|
jennyyyyzhen
|
2aab2bb543
|
[ROCM] Optimize ROCM_AITER_FA spec decode eagle performance (#34541)
Signed-off-by: jennyyyyzhen <yzhen@hmc.edu>
|
2026-02-20 20:32:05 -08:00 |
|
Andreas Karatzas
|
cf93c1a128
|
[ROCm][AITER] Fix aiter paged_attention_v1 decode for sliding window and head_size < 64 (#34570)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-02-20 20:25:07 -08:00 |
|
Kevin McKay
|
8de7c636cc
|
[Bugfix][Hardware][AMD] Fix ROCM_AITER_FA speculative decoding support (#32877)
Signed-off-by: c0de128 <kevin.mckay@outlook.com>
Co-authored-by: Lu Fang <30275821+houseroad@users.noreply.github.com>
|
2026-02-19 22:25:46 -08:00 |
|
Andreas Karatzas
|
de42abb366
|
[CI] Heavy refactoring of Voxtral multimodal audio model tests (#34294)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-02-13 20:04:29 -08:00 |
|
Mengtao (Martin) Yuan
|
9ea1f598ce
|
Use paged_attention_v1 for sliding window decode in rocm_aiter_fa (#34378)
Signed-off-by: Martin Yuan <myuan@meta.com>
Co-authored-by: Martin Yuan <myuan@meta.com>
|
2026-02-12 16:14:43 -08:00 |
|
kliuae
|
64f570ab56
|
[ROCm] [aiter] Split KV cache update for AiterFlashAttention (#33681)
Signed-off-by: kliuae <kuanfu.liu@embeddedllm.com>
|
2026-02-11 16:26:44 +00:00 |
|
Andreas Karatzas
|
350ca72c04
|
[ROCm][AITER] Fix AITER import regression for explicit backend selection (#33749)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-02-06 15:08:16 +00:00 |
|
Rabi Mishra
|
9eb58f8cf1
|
fix[ROCm]: Remove unconditional aiter import (#32902)
Signed-off-by: rabi <ramishra@redhat.com>
|
2026-02-02 22:10:02 +08:00 |
|
jennyyyyzhen
|
527bcd14d4
|
[ROCM] Enable aiter attn backend for qwen3-next model (#32492)
Signed-off-by: jennyyyyzhen <yzhen@hmc.edu>
|
2026-01-31 17:03:57 +08:00 |
|
Matthew Bonanni
|
a608b4c6c2
|
[5/N][Attention] Finish eliminating vllm/attention folder (#32064)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
|
2026-01-27 10:02:51 -05:00 |
|
Pleaplusone
|
130d6c9514
|
[ROCm][Perf] Enable shuffle kv cache layout and assembly paged attention kernel for AiterFlashAttentionBackend (#29887)
Signed-off-by: ganyi <ygan@amd.com>
|
2026-01-15 15:29:53 +00:00 |
|
Matthew Bonanni
|
20228cb851
|
[3/N][Attention] Move AttentionMetadata-related code from utils.py to backend.py (#32054)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
|
2026-01-12 09:13:56 -08:00 |
|
Matthew Bonanni
|
2612ba9285
|
[1/N][Attention] Restructure attention: move files (#31916)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
|
2026-01-09 13:10:24 -08:00 |
|
Lucas Wilkinson
|
c7a79d41a0
|
[Attention][3/n] Remove usage of deprecated seq_lens_cpu and num_computed_tokens_cpu CommonAttentionMetadata properties (#31850)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2026-01-07 13:31:34 +08:00 |
|
Pleaplusone
|
8c363ed666
|
[ROCm][Attention] Sliding window support for AiterFlashAttentionBackend (#29234)
Signed-off-by: ganyi <ygan@amd.com>
|
2025-11-30 11:31:50 +00:00 |
|
tongqiu
|
5253f4276f
|
[ROCm] Support for Whisper v1 with Aiter Unified Attention and Aiter Flash Attention (#28376)
Signed-off-by: apinge <Tong.Qiu2@amd.com>
|
2025-11-24 03:26:00 +00:00 |
|
Nicolò Lucchesi
|
066209a045
|
[Attention] Refactor FA block_size limitations to hybrid models only (#29084)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2025-11-22 06:38:44 -08:00 |
|
Xiake Sun
|
60e089f0b9
|
[ROCm][Qwen3-32B] Fix AITER MHA accuracy issue cause by #25763 (#28670)
Signed-off-by: Xiake Sun <xiake.sun@amd.com>
|
2025-11-16 20:52:11 -08:00 |
|
Pleaplusone
|
8da2f28f53
|
[ROCm][BugFix]Fix get_cu_count in rocm_aiter_fa.py (#28618)
Signed-off-by: ganyi <ygan@amd.com>
|
2025-11-13 14:18:20 +00:00 |
|
Pleaplusone
|
ca00b1bfc6
|
[ROCm][BugFix] Remove the usage of device_info from aiter (#28383)
Signed-off-by: ganyi <ygan@amd.com>
|
2025-11-12 21:43:42 -08:00 |
|
Benjamin Chislett
|
304419576a
|
[Perf] Refactor cudagraph_support to enable full CUDA graphs for spec decoding with FlashInfer (#28479)
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>
|
2025-11-13 01:56:40 +09:00 |
|
Matthew Bonanni
|
b30dfa03c5
|
[Attention] Refactor CUDA attention backend selection logic (#24794)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
|
2025-11-11 07:40:44 -05:00 |
|
Lucas Wilkinson
|
e8697faf03
|
[V0 deprecation] Remove no longer used get_metadata_cls (#28370)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2025-11-10 14:32:09 +08:00 |
|
Pleaplusone
|
dc937175d4
|
[ROCm][Perf] New design on ROCm AITER MHA backend Implementation (#25763)
Signed-off-by: ganyi <ygan@amd.com>
|
2025-11-04 18:05:33 +00:00 |
|
Isotr0py
|
6ac5e06f7c
|
[Chore] Clean up pytorch helper functions in vllm.utils (#26908)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: isotr0py <2037008807@qq.com>
|
2025-10-18 09:48:22 -07:00 |
|
Boyuan Feng
|
a86b4c58e8
|
remove attn output view kernel (#26680)
Signed-off-by: Boyuan Feng <boyuan@meta.com>
Signed-off-by: Boyuan Feng <fby.1994@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-10-14 22:53:10 +00:00 |
|
Harry Mellor
|
8fcaaf6a16
|
Update Optional[x] -> x | None and Union[x, y] to x | y (#26633)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-10-12 09:51:31 -07:00 |
|
Zhiyuan Li
|
d24cf322e1
|
[Hybrid]: Decouple Kernel Block Size from KV Page Size (#24486)
Signed-off-by: lizhiyuan <uniartisan2017@gmail.com>
Signed-off-by: Zhiyuan Li <uniartisan2017@gmail.com>
|
2025-10-08 23:43:39 -07:00 |
|
Harry Mellor
|
d6953beb91
|
Convert formatting to use ruff instead of yapf + isort (#26247)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-10-05 07:06:22 -07:00 |
|
Yongye Zhu
|
fa7e254a7f
|
[New Model] DeepSeek-V3.2 (Rebased to Main) (#25896)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: NickLucche <nlucches@redhat.com>
Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>
Signed-off-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com>
Signed-off-by: Lucia Fang <fanglu@meta.com>
Co-authored-by: Chen Zhang <zhangch99@outlook.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: Lucas Wilkinson <lwilkins@redhat.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
Co-authored-by: yewentao256 <zhyanwentao@126.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
Co-authored-by: Lucia Fang <116399278+luccafong@users.noreply.github.com>
Co-authored-by: Lucia Fang <fanglu@meta.com>
Co-authored-by: NickLucche <nlucches@redhat.com>
Co-authored-by: Siyuan Fu <siyuanf@nvidia.com>
Co-authored-by: Matthew Bonanni <mbonanni@redhat.com>
Co-authored-by: Xiaozhu Meng <mxz297@gmail.com>
Co-authored-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com>
|
2025-09-30 17:14:41 +08:00 |
|
Matthew Bonanni
|
3468f17ebe
|
[V0 deprecation] Remove _VLLM_V1 suffixes from attention backend names (#25489)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com>
|
2025-09-25 17:37:50 +00:00 |
|
Rohan Potdar
|
bbdc0f2366
|
[ROCm][AITER][Bugfix] Switch AITER to use PIECEWISE_AND_FULL compilation (#25104)
Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>
|
2025-09-18 17:46:47 +00:00 |
|
Douglas Lehr
|
1a456c7c90
|
Aiter mha fp8 fix (#24991)
Signed-off-by: Doug Lehr <douglehr@amd.com>
Co-authored-by: Doug Lehr <douglehr@amd.com>
|
2025-09-17 22:29:14 +00:00 |
|
Russell Bryant
|
37e8182bfe
|
[v1] Add Whisper model support (encoder-decoder) (#21088)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
Co-authored-by: NickLucche <nlucches@redhat.com>
|
2025-09-10 13:53:35 -07:00 |
|
Hyogeun Oh (오효근)
|
4e4d017b6f
|
[Docs] Fix warnings in mkdocs build (continued) (#23743)
Signed-off-by: Zerohertz <ohg3417@gmail.com>
Signed-off-by: Hyogeun Oh (오효근) <ohg3417@gmail.com>
|
2025-08-27 17:17:29 +00:00 |
|
elvischenv
|
24d0c9e6ed
|
[NVIDIA][torch.compile] Support Flashinfer TRTLLM FP8-q/kv NVFP4-out Attention Kernel (#22703)
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
|
2025-08-22 22:09:05 +00:00 |
|
Woosuk Kwon
|
d6d13bd49e
|
[Misc] Add max_seq_len to CommonAttentionMetadata (#23216)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-08-20 09:05:29 -07:00 |
|