Cyrus Leung
|
33b06a6f24
|
[Misc] Remove redundant attention var constants (#29650)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-11-28 04:35:19 -08:00 |
|
Roger Wang
|
0ff70821c9
|
[Core] Deprecate xformers (#29262)
Signed-off-by: Roger Wang <hey@rogerw.io>
|
2025-11-24 04:18:55 +00:00 |
|
rasmith
|
5e5a7eb16f
|
[CI/Build] Make test_attention_selector.py run tests on correct platform (#29064)
Signed-off-by: Randall Smith <ransmith@amd.com>
Signed-off-by: rasmith <Randall.Smith@amd.com>
Co-authored-by: Randall Smith <ransmith@amd.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-11-20 20:45:56 +00:00 |
|
Li, Jiang
|
7f829be7d3
|
[CPU] Refactor CPU attention backend (#27954)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
|
2025-11-12 09:43:06 +08:00 |
|
Matthew Bonanni
|
b30dfa03c5
|
[Attention] Refactor CUDA attention backend selection logic (#24794)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
|
2025-11-11 07:40:44 -05:00 |
|
Pleaplusone
|
6cae1e5332
|
[ROCm][MLA] Support block-size > 1 for AITER MLA backend (#27224)
Signed-off-by: ganyi <ygan@amd.com>
Co-authored-by: wuhuikx <hattie.wu@amd.com>
|
2025-11-05 10:43:02 -05:00 |
|
Wenzheng Bi
|
ec10fd0abc
|
[Bugfix] Move current_platform import to avoid python import cache. (#16601)
Signed-off-by: iwzbi <wzbi@zju.edu.cn>
|
2025-10-09 10:46:19 +00:00 |
|
Matthew Bonanni
|
76879cc160
|
[Attention] Implement universal BACKEND_MAP (#25900)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
|
2025-10-08 12:00:25 -07:00 |
|
Lucas Wilkinson
|
f80e7866c0
|
[Misc] Clean up cruft from previous FlashMLA sparse implementation (#26125)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2025-10-08 10:09:34 +08:00 |
|
Cyrus Leung
|
1e4ecca1d0
|
[V0 Deprecation] Remove VLLM_USE_V1 from tests (#26341)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-10-07 15:42:31 +00:00 |
|
Harry Mellor
|
6c04638214
|
Fix per file ruff ignores related to line length (#26262)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-10-06 05:12:40 +00:00 |
|
Harry Mellor
|
d6953beb91
|
Convert formatting to use ruff instead of yapf + isort (#26247)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-10-05 07:06:22 -07:00 |
|
Matthew Bonanni
|
3468f17ebe
|
[V0 deprecation] Remove _VLLM_V1 suffixes from attention backend names (#25489)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com>
|
2025-09-25 17:37:50 +00:00 |
|
Thomas Parnell
|
969b4da3a6
|
[V0 Deprecation] Remove placeholder attn (#25510)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
|
2025-09-23 22:12:14 +00:00 |
|
Isotr0py
|
b6a136b58c
|
[CI/Build] Fix disabled v1 attention backend selection test (#25471)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-09-23 13:05:46 +00:00 |
|
Woosuk Kwon
|
bc6e542d9f
|
Remove V0 attention backends (#25351)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-09-21 16:03:28 -07:00 |
|
Woosuk Kwon
|
52c2a8d4ad
|
[V0 Deprecation] Remove LLMEngine (#25033)
Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai>
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-09-20 17:56:30 -07:00 |
|
Michael Goin
|
087c6ffc92
|
[CI Bugfix] Fix failing test_invalid_env (#25078)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-09-17 08:28:58 -07:00 |
|
Matthew Bonanni
|
5fe643fc26
|
Add FLASHINFER_MLA to backend selector test (#24753)
Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com>
|
2025-09-12 22:30:07 +00:00 |
|
Lucas Wilkinson
|
402759d472
|
[Attention] FlashAttn MLA (#14258)
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com>
Co-authored-by: Matthew Bonanni <mbonanni001@gmail.com>
Co-authored-by: Matthew Bonanni <mbonanni@redhat.com>
|
2025-09-04 02:47:59 -07:00 |
|
Woosuk Kwon
|
14006840ea
|
[V0 Deprecation] Remove V0 FlashInfer attention backend (#22776)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-08-18 19:54:16 -07:00 |
|
Michael Goin
|
e79a12fc3a
|
[UX] Fail if an invalid attention backend is specified (#22217)
Signed-off-by: mgoin <michael@neuralmagic.com>
|
2025-08-04 23:54:52 -07:00 |
|
Cyrus Leung
|
9fb52e523a
|
[V1] Support any head size for FlexAttention backend (#20467)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-07-06 09:54:36 -07:00 |
|
Woosuk Kwon
|
e202dd2736
|
[V0 deprecation] Remove V0 CPU/XPU/TPU backends (#20412)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Signed-off-by: jiang1.li <jiang1.li@intel.com>
Co-authored-by: Li, Jiang <jiang1.li@intel.com>
|
2025-07-06 08:48:13 -07:00 |
|
Isotr0py
|
32c9be2200
|
[v1] Re-add fp32 support to v1 engine through FlexAttention (#19754)
Signed-off-by: Isotr0py <2037008807@qq.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-07-05 09:41:10 +00:00 |
|
TY-AMD
|
96453cfa83
|
[BugFix][V1][ROCm] Triton MLA uses V0 backend on V1 engine (#19067)
Signed-off-by: Tianyuan Wu <Tianyuan.Wu@amd.com>
|
2025-07-01 16:12:19 +08:00 |
|
Isotr0py
|
5f1ac1e1d1
|
Revert "[v1] Add fp32 support to v1 engine through flex attn" (#19404)
|
2025-06-10 01:30:20 -07:00 |
|
Isotr0py
|
b8089195b4
|
[v1] Add fp32 support to v1 engine through flex attn (#19319)
Signed-off-by: Isotr0py <2037008807@qq.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-06-09 22:10:44 +08:00 |
|
Li, Jiang
|
4555143ea7
|
[CPU] V1 support for the CPU backend (#16441)
|
2025-06-03 18:43:01 -07:00 |
|
Simon Mo
|
02f0c7b220
|
[Misc] Add SPDX-FileCopyrightText (#19100)
Signed-off-by: simon-mo <simon.mo@hey.com>
|
2025-06-03 11:20:17 -07:00 |
|
tracelogfb
|
246e3e0a36
|
fix broken test vllm:test_kernels - test_attention_selector.py::test_flash_attn (#17873)
Co-authored-by: Stephen Chen <tracelog@meta.com>
|
2025-05-10 10:46:54 +08:00 |
|
vllmellm
|
3c9396a64f
|
[FEAT][ROCm]: Support AITER MLA on V1 Engine (#17523)
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
Co-authored-by: qli88 <qiang.li2@amd.com>
Co-authored-by: Hongxia Yang <62075498+hongxiayang@users.noreply.github.com>
|
2025-05-09 10:42:05 +08:00 |
|
Michael Goin
|
6317a5174a
|
Categorize tests/kernels/ based on kernel type (#16799)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-04-23 09:21:07 -04:00 |
|