Lucas Wilkinson
|
8b5014d3dd
|
[Attention] FA4 integration (#32974)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
Co-authored-by: Matthew Bonanni <mbonanni@redhat.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
|
2026-03-01 23:44:57 +00:00 |
|
Lucas Wilkinson
|
bb85929aa6
|
[BugFix] Fix Python 3.13 FlashMLA import error (#34548)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
|
2026-02-15 20:09:18 -08:00 |
|
Lucas Wilkinson
|
c7914d30f9
|
Reapply [Attention][FA3] Update FA3 to include new swizzle optimization (#34043)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2026-02-11 07:07:56 -08:00 |
|
Andrey Talman
|
f97ca67176
|
[Release 2.10] Update to Torch 2.10 - final release (#30525)
|
2026-02-08 13:51:09 -08:00 |
|
Luka Govedič
|
e3bf79ffa0
|
Revert "[Attention][FA3] Update FA3 to include new swizzle optimization" (#33841)
|
2026-02-04 19:54:27 -08:00 |
|
Lucas Wilkinson
|
2267cb1cfd
|
[Attention][FA3] Update FA3 to include new swizzle optimization (#23465)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2026-02-03 08:08:47 -08:00 |
|
Lucas Wilkinson
|
889722f3bf
|
[FlashMLA] Update FlashMLA to expose new arguments (#32810)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2026-01-21 22:02:39 -07:00 |
|
Lucas Wilkinson
|
b4f64e5b02
|
Update FlashMLA (#32491)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2026-01-21 13:03:37 +08:00 |
|
Lucas Wilkinson
|
be6a81f31b
|
[chore] Update FA commit (#30460)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2026-01-07 23:24:18 -08:00 |
|
Roberto L. Castro
|
fdcc5176be
|
[BugFix] Fix architecture flags to prevent issues on SM103 (#31150)
Signed-off-by: LopezCastroRoberto <robertol.c510@gmail.com>
|
2026-01-05 20:11:35 +00:00 |
|
Shengqi Chen
|
511e81e7c9
|
[BUILD] use sm_100f when compiling flashmla to fix support on sm103 (#30705)
Signed-off-by: Shengqi Chen <harry-chen@outlook.com>
|
2025-12-15 14:48:01 -08:00 |
|
Lucas Wilkinson
|
c68c7b403d
|
[BugFix] Fix missing symbol triggering FA2 fallback on Hopper (#29107)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2025-11-21 13:58:32 -08:00 |
|
Matthew Bonanni
|
4c23690f43
|
[Attention] FlashAttention ViT support, make default backend (#28763)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
|
2025-11-18 20:06:21 -08:00 |
|
Varun Sundar Rabindranath
|
9912b8ccb8
|
[Build] Add OpenAI triton_kernels (#28788)
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
|
2025-11-18 16:45:20 -08:00 |
|
Matthew Bonanni
|
8cc40f8992
|
[Attention] Bump FA for removed method (#28429)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2025-11-14 09:13:37 -08:00 |
|
Jonas M. Kübler
|
9c84ca8293
|
[FA/Chore] Bump FA version for FP8 two-level accumulation (#27889)
Signed-off-by: Jonas Kuebler <kuebj@amazon.com>
Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
|
2025-11-10 12:06:04 -08:00 |
|
Abolfazl Shahbazi
|
d15afc1fd0
|
Refactor CPU/GPU extension targets for CMake build (#28026)
Signed-off-by: Abolfazl Shahbazi <12436063+ashahba@users.noreply.github.com>
|
2025-11-08 14:17:35 +08:00 |
|
Matthew Bonanni
|
b4fda58a2d
|
[MLA] Bump FlashMLA (#27354)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
|
2025-10-22 15:48:37 -07:00 |
|
Tao He
|
250fb1b8ea
|
[Bugfix] fixes the decoding metadata of dense mla's fp8 kvcache. (#27144)
Signed-off-by: Tao He <linzhu.ht@alibaba-inc.com>
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Co-authored-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2025-10-21 18:27:03 +00:00 |
|
Lucas Wilkinson
|
9f020f4f31
|
[BugFix] Fix failing gemma-3-1b-it test: test_lm_eval_accuracy_v1_engine[google/gemma-3-1b-it] (#27111)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2025-10-18 12:44:39 -06:00 |
|
zhrrr
|
e471d7ca7e
|
[CI/Build][Bugfix] fix qutlass cmake error when set QUTLASS_SRC_DIR (#26773)
Signed-off-by: izhuhaoran <izhuhaoran@qq.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
|
2025-10-15 04:09:44 +00:00 |
|
Roberto L. Castro
|
96ad65b7fe
|
[Transform] [Quantization] Add QuTLASS support to vLLM (#24440)
Signed-off-by: LopezCastroRoberto <roberto.lopez.castro@udc.es>
Signed-off-by: Roberto L. Castro <38211239+LopezCastroRoberto@users.noreply.github.com>
Signed-off-by: Andrei Panferov <andrei@panferov.org>
Co-authored-by: Andrei Panferov <andrei@panferov.org>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
|
2025-10-10 09:43:40 -07:00 |
|
Ming Yang
|
3b736e1c38
|
[Attention][DCP] Support DCP with query length > 1 (MTP) with FA3 (#25049)
Signed-off-by: Ming Yang <minos.future@gmail.com>
|
2025-10-09 08:06:29 -07:00 |
|
Lucas Wilkinson
|
418d111f8c
|
[FA/Chore] Bump vllm-flash-attention (#25537)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2025-10-02 11:06:14 -04:00 |
|
Yongye Zhu
|
fa7e254a7f
|
[New Model] DeepSeek-V3.2 (Rebased to Main) (#25896)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: NickLucche <nlucches@redhat.com>
Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>
Signed-off-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com>
Signed-off-by: Lucia Fang <fanglu@meta.com>
Co-authored-by: Chen Zhang <zhangch99@outlook.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: Lucas Wilkinson <lwilkins@redhat.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
Co-authored-by: yewentao256 <zhyanwentao@126.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
Co-authored-by: Lucia Fang <116399278+luccafong@users.noreply.github.com>
Co-authored-by: Lucia Fang <fanglu@meta.com>
Co-authored-by: NickLucche <nlucches@redhat.com>
Co-authored-by: Siyuan Fu <siyuanf@nvidia.com>
Co-authored-by: Matthew Bonanni <mbonanni@redhat.com>
Co-authored-by: Xiaozhu Meng <mxz297@gmail.com>
Co-authored-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com>
|
2025-09-30 17:14:41 +08:00 |
|
Lucas Wilkinson
|
402759d472
|
[Attention] FlashAttn MLA (#14258)
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com>
Co-authored-by: Matthew Bonanni <mbonanni001@gmail.com>
Co-authored-by: Matthew Bonanni <mbonanni@redhat.com>
|
2025-09-04 02:47:59 -07:00 |
|
Matthew Bonanni
|
19fe1a0510
|
[Kernel] Add FP8 support with FlashMLA backend (#22668)
Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com>
|
2025-08-22 02:26:32 +00:00 |
|
Lucas Wilkinson
|
292084e72a
|
[BugFix] Fix for IMA in FA3 varlen combine (#22967)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2025-08-17 08:52:04 -07:00 |
|
Lucas Wilkinson
|
177e55e3bd
|
[Attention] FA3 Attention Sinks Perf Boost (#22478)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2025-08-15 17:41:07 -04:00 |
|
Thomas Parnell
|
bd875d2eb7
|
[Bugfix] Update FA commit hash (#22546)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
|
2025-08-08 16:10:25 -07:00 |
|
Lucas Wilkinson
|
cd9b9de1fb
|
[BugFix] Fix IMA FlashMLA full cuda-graph and DP + Update FlashMLA (#21691)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Co-authored-by: yewentao256 <zhyanwentao@126.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
|
2025-08-08 16:09:42 -07:00 |
|
Lucas Wilkinson
|
2cb6ef8996
|
[BugFix] Fix FA2 RuntimeError when sinks is provided (#22365)
Signed-off-by: LucasWilkinson <lwilkinson@neuralmagic.com>
|
2025-08-06 08:03:03 -07:00 |
|
Woosuk Kwon
|
e3c876dca3
|
Upgrade FA3 for attention sink (#22313)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-08-05 21:36:21 -07:00 |
|
Woosuk Kwon
|
8acb4badee
|
[CUDA graphs] Enable full cuda graphs with FA3 AoT scheduling (#20301)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-07-01 09:07:36 -07:00 |
|
Eli Uriegas
|
0d06b533a0
|
cmake: Update vllm_flash_attn for vllm_kernels (#20032)
Signed-off-by: Eli Uriegas <eliuriegas@meta.com>
|
2025-06-24 22:44:10 +00:00 |
|
Lucas Wilkinson
|
a045b7e89a
|
[Perf] Improve/Fix-regression for FA3 in High QPS regimes (#19463)
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
|
2025-06-24 13:09:01 -04:00 |
|
Lucas Wilkinson
|
07334959d8
|
[Wheel Size] Only build FA2 8.0+PTX (#19336)
|
2025-06-17 12:32:49 +09:00 |
|
Luka Govedič
|
a3896c7f02
|
[Build] Fixes for CMake install (#18570)
|
2025-05-27 20:49:24 -04:00 |
|
yexin(叶鑫)
|
b22980a1dc
|
[Perf]Optimize rotary_emb implementation to use Triton operator for improved inference performance (#16457)
Signed-off-by: cynthieye <yexin93@qq.com>
Co-authored-by: MagnetoWang <magnetowang@outlook.com>
|
2025-04-25 14:52:28 +08:00 |
|
Lucas Wilkinson
|
41ca7eb491
|
[Attention] FA3 decode perf improvement - single mma warp group support for head dim 128 (#16864)
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
|
2025-04-24 20:12:21 -07:00 |
|
Lucas Wilkinson
|
183dad7a85
|
[Attention] Update to lastest FA3 code (#13111)
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
|
2025-04-17 15:14:07 -07:00 |
|
Mickaël Seznec
|
a597a57595
|
[Attention] Flash Attention 3 - fp8 (#14570)
Signed-off-by: Mickael Seznec <mickael@mistral.ai>
|
2025-03-20 01:14:20 -04:00 |
|
Pavani Majety
|
ed6ea06577
|
[Hardware] Update the flash attn tag to support Blackwell (#14244)
|
2025-03-05 22:01:37 -08:00 |
|
Lucas Wilkinson
|
f95903909f
|
[Kernel] FlashMLA integration (#13747)
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2025-02-27 10:35:08 +08:00 |
|