Commit Graph

129 Commits

Author SHA1 Message Date
Jonas M. Kübler
77d2a5f17b pick up tuned prefill configs for FP8 FA3 (#36265)
Signed-off-by: Jonas M. Kübler <44084297+jmkuebler@users.noreply.github.com>
Signed-off-by: Jonas Kuebler <kuebj@amazon.com>
2026-03-17 07:00:26 -07:00
Li, Jiang
092ace9e3a [UX] Improve UX of CPU backend (#36968)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
Signed-off-by: Li, Jiang <bigpyj64@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-03-14 09:27:29 +08:00
Matthew Bonanni
f444c05c32 [Attention] Use FA4 for MLA prefill (#34732)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
2026-03-12 12:10:17 -04:00
typer-J
4184653775 feat: add RISC-V support for CPU backend (v2) (#36578)
Signed-off-by: typer-J <2236066784@qq.com>
Co-authored-by: Li, Jiang <jiang1.li@intel.com>
2026-03-10 21:51:39 -07:00
Nikhil Gupta
0a49676fb0 cpu: aarch64: Upgrade OneDNN for aarch64 to add support for int8 matmul (#36147)
Signed-off-by: Nikhil Gupta <nikhil.gupta2@arm.com>
2026-03-06 03:48:59 +00:00
Lucas Wilkinson
f44d1ddc8c [BugFix] Fix cmake based incremental install (wrong vllm install dir) (#35773)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
2026-03-02 21:58:16 -08:00
Lucas Wilkinson
8b5014d3dd [Attention] FA4 integration (#32974)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
Co-authored-by: Matthew Bonanni <mbonanni@redhat.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
2026-03-01 23:44:57 +00:00
Ma Jian
90805ff464 [CI/Build] CPU release supports both of AVX2 and AVX512 (#35466)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
Co-authored-by: jiang1.li <jiang1.li@intel.com>
2026-02-28 04:35:21 +00:00
Lucas Wilkinson
bb85929aa6 [BugFix] Fix Python 3.13 FlashMLA import error (#34548)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
2026-02-15 20:09:18 -08:00
Maryam Tahhan
f07a128413 [CPU][ARM] Add ARM BF16 cross-compilation support and improve documen… (#33079)
Signed-off-by: Maryam Tahhan <mtahhan@redhat.com>
Co-authored-by: Li, Jiang <jiang1.li@intel.com>
2026-02-15 06:33:08 -08:00
Lucas Wilkinson
c7914d30f9 Reapply [Attention][FA3] Update FA3 to include new swizzle optimization (#34043)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2026-02-11 07:07:56 -08:00
Andrey Talman
f97ca67176 [Release 2.10] Update to Torch 2.10 - final release (#30525) 2026-02-08 13:51:09 -08:00
Luka Govedič
e3bf79ffa0 Revert "[Attention][FA3] Update FA3 to include new swizzle optimization" (#33841) 2026-02-04 19:54:27 -08:00
R3hankhan
4dffc5e044 [CPU] Split attention dispatch by head_dim alignment (#32161)
Signed-off-by: Rehan Khan <Rehan.Khan7@ibm.com>
2026-02-03 19:37:15 -08:00
Lucas Wilkinson
2267cb1cfd [Attention][FA3] Update FA3 to include new swizzle optimization (#23465)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2026-02-03 08:08:47 -08:00
Maryam Tahhan
203d0bc0c2 [CPU] Improve CPU Docker build (#30953)
Signed-off-by: Maryam Tahhan <mtahhan@redhat.com>
Co-authored-by: Li, Jiang <jiang1.li@intel.com>
2026-01-24 17:08:24 +00:00
Fadi Arafeh
744ef30484 [CPU Backend] [Perf] Accelerate tensor-parallel/data-parallel inference across NUMA domains on Arm (#32792)
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com>
2026-01-22 18:55:23 +00:00
Lucas Wilkinson
889722f3bf [FlashMLA] Update FlashMLA to expose new arguments (#32810)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2026-01-21 22:02:39 -07:00
Lucas Wilkinson
b4f64e5b02 Update FlashMLA (#32491)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2026-01-21 13:03:37 +08:00
Lucas Wilkinson
be6a81f31b [chore] Update FA commit (#30460)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2026-01-07 23:24:18 -08:00
Roberto L. Castro
fdcc5176be [BugFix] Fix architecture flags to prevent issues on SM103 (#31150)
Signed-off-by: LopezCastroRoberto <robertol.c510@gmail.com>
2026-01-05 20:11:35 +00:00
Li, Jiang
e3ab93c896 [CPU] Refactor CPU fused MOE (#30531)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
2025-12-18 14:36:49 +08:00
Shengqi Chen
511e81e7c9 [BUILD] use sm_100f when compiling flashmla to fix support on sm103 (#30705)
Signed-off-by: Shengqi Chen <harry-chen@outlook.com>
2025-12-15 14:48:01 -08:00
Fadi Arafeh
f355ad5412 [CPU][FIX] Fix build failures on Arm CPUs with torch nightly (#30481)
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com>
2025-12-12 02:09:25 +00:00
Radu Salavat
180345807f [CMake][Build]: Remove unused ACL CMake env variables (#30339)
Signed-off-by: Radu Salavat <radu.salavat@arm.com>
2025-12-10 04:27:19 +00:00
Ralf Gommers
7c1ed45848 [CI/Build]: make it possible to build with a free-threaded interpreter (#29241)
Signed-off-by: Ralf Gommers <ralf.gommers@gmail.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2025-11-28 15:21:46 -08:00
Lucas Wilkinson
c68c7b403d [BugFix] Fix missing symbol triggering FA2 fallback on Hopper (#29107)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2025-11-21 13:58:32 -08:00
Matthew Bonanni
4c23690f43 [Attention] FlashAttention ViT support, make default backend (#28763)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
2025-11-18 20:06:21 -08:00
Li, Jiang
20852c8f4c [CPU] Refactor CPU WNA16 (#28826)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
2025-11-19 10:32:00 +08:00
Varun Sundar Rabindranath
9912b8ccb8 [Build] Add OpenAI triton_kernels (#28788)
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
2025-11-18 16:45:20 -08:00
Matthew Bonanni
8cc40f8992 [Attention] Bump FA for removed method (#28429)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2025-11-14 09:13:37 -08:00
Akash kaothalkar
86d15bfd8d [Hardware][PowerPC] Fix fp16 compilation error for Power in cpu attention backend and bump oneDNN version (#28535)
Signed-off-by: Akash Kaothalkar <akash.kaothalkar@ibm.com>
Co-authored-by: Akash Kaothalkar <akash.kaothalkar@ibm.com>
2025-11-13 13:32:21 +00:00
Radu Salavat
d44fbbab0e [build][cmake]: Bundle static ACL and torch libgomp for CPU extension builds (#28059)
Signed-off-by: Radu Salavat <radu.salavat@arm.com>
2025-11-13 05:43:08 +00:00
Li, Jiang
7f829be7d3 [CPU] Refactor CPU attention backend (#27954)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
2025-11-12 09:43:06 +08:00
Jonas M. Kübler
9c84ca8293 [FA/Chore] Bump FA version for FP8 two-level accumulation (#27889)
Signed-off-by: Jonas Kuebler <kuebj@amazon.com>
Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
2025-11-10 12:06:04 -08:00
Abolfazl Shahbazi
d15afc1fd0 Refactor CPU/GPU extension targets for CMake build (#28026)
Signed-off-by: Abolfazl Shahbazi <12436063+ashahba@users.noreply.github.com>
2025-11-08 14:17:35 +08:00
Fadi Arafeh
a663f6ae64 [cpu][perf] Fix low CPU utilization with VLLM_CPU_OMP_THREADS_BIND on AArch64 (#27415)
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com>
2025-10-27 11:14:55 +00:00
Matthew Bonanni
b4fda58a2d [MLA] Bump FlashMLA (#27354)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
2025-10-22 15:48:37 -07:00
Tao He
250fb1b8ea [Bugfix] fixes the decoding metadata of dense mla's fp8 kvcache. (#27144)
Signed-off-by: Tao He <linzhu.ht@alibaba-inc.com>
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Co-authored-by: Lucas Wilkinson <lwilkins@redhat.com>
2025-10-21 18:27:03 +00:00
Fadi Arafeh
163965d183 [cpu] Dispatch un-quantized linear to oneDNN/ACL by default for AArch64 (#27183)
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com>
Co-authored-by: Michael Yang <Michael.Yang@arm.com>
2025-10-21 02:02:58 +00:00
Lucas Wilkinson
9f020f4f31 [BugFix] Fix failing gemma-3-1b-it test: test_lm_eval_accuracy_v1_engine[google/gemma-3-1b-it] (#27111)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2025-10-18 12:44:39 -06:00
zhrrr
e471d7ca7e [CI/Build][Bugfix] fix qutlass cmake error when set QUTLASS_SRC_DIR (#26773)
Signed-off-by: izhuhaoran <izhuhaoran@qq.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
2025-10-15 04:09:44 +00:00
ihb2032
086609de64 fix(nix): Allow local oneDNN path to fix vLLM CPU build failure (#26401)
Signed-off-by: lyd1992 <liuyudong@iscas.ac.cn>
Signed-off-by: ihb2032 <1355790728@qq.com>
2025-10-11 09:12:16 +00:00
Nishidha Panpaliya
8f8474fbe3 [CI/Build] Fix ppc64le CPU build and tests (#22443)
Signed-off-by: Nishidha Panpaliya <nishidha.panpaliya@partner.ibm.com>
2025-10-11 13:04:42 +08:00
Roberto L. Castro
96ad65b7fe [Transform] [Quantization] Add QuTLASS support to vLLM (#24440)
Signed-off-by: LopezCastroRoberto <roberto.lopez.castro@udc.es>
Signed-off-by: Roberto L. Castro <38211239+LopezCastroRoberto@users.noreply.github.com>
Signed-off-by: Andrei Panferov <andrei@panferov.org>
Co-authored-by: Andrei Panferov <andrei@panferov.org>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
2025-10-10 09:43:40 -07:00
Ming Yang
3b736e1c38 [Attention][DCP] Support DCP with query length > 1 (MTP) with FA3 (#25049)
Signed-off-by: Ming Yang <minos.future@gmail.com>
2025-10-09 08:06:29 -07:00
Harry Mellor
d6953beb91 Convert formatting to use ruff instead of yapf + isort (#26247)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-10-05 07:06:22 -07:00
Fadi Arafeh
9705fba7b7 [cpu][perf] Accelerate unquantized-linear for AArch64 through oneDNN/ACL and weight prepack (#25948)
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com>
Co-authored-by: Li, Jiang <jiang1.li@intel.com>
2025-10-04 12:16:38 +08:00
Lucas Wilkinson
418d111f8c [FA/Chore] Bump vllm-flash-attention (#25537)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2025-10-02 11:06:14 -04:00
Johnny
5234dc7451 [NVIDIA] Blackwell Family (#24673)
Signed-off-by: Johnny <johnnynuca14@gmail.com>
Signed-off-by: johnnynunez <johnnynuca14@gmail.com>
Signed-off-by: Johnny <johnnync13@gmail.com>
Signed-off-by: Salvatore Cena <cena@cenas.it>
Co-authored-by: Aidyn-A <31858918+Aidyn-A@users.noreply.github.com>
Co-authored-by: Salvatore Cena <cena@cenas.it>
2025-10-01 10:50:54 -07:00