Jonas M. Kübler
|
77d2a5f17b
|
pick up tuned prefill configs for FP8 FA3 (#36265)
Signed-off-by: Jonas M. Kübler <44084297+jmkuebler@users.noreply.github.com>
Signed-off-by: Jonas Kuebler <kuebj@amazon.com>
|
2026-03-17 07:00:26 -07:00 |
|
Li, Jiang
|
092ace9e3a
|
[UX] Improve UX of CPU backend (#36968)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
Signed-off-by: Li, Jiang <bigpyj64@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2026-03-14 09:27:29 +08:00 |
|
Matthew Bonanni
|
f444c05c32
|
[Attention] Use FA4 for MLA prefill (#34732)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
|
2026-03-12 12:10:17 -04:00 |
|
typer-J
|
4184653775
|
feat: add RISC-V support for CPU backend (v2) (#36578)
Signed-off-by: typer-J <2236066784@qq.com>
Co-authored-by: Li, Jiang <jiang1.li@intel.com>
|
2026-03-10 21:51:39 -07:00 |
|
Nikhil Gupta
|
0a49676fb0
|
cpu: aarch64: Upgrade OneDNN for aarch64 to add support for int8 matmul (#36147)
Signed-off-by: Nikhil Gupta <nikhil.gupta2@arm.com>
|
2026-03-06 03:48:59 +00:00 |
|
Lucas Wilkinson
|
f44d1ddc8c
|
[BugFix] Fix cmake based incremental install (wrong vllm install dir) (#35773)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
|
2026-03-02 21:58:16 -08:00 |
|
Lucas Wilkinson
|
8b5014d3dd
|
[Attention] FA4 integration (#32974)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
Co-authored-by: Matthew Bonanni <mbonanni@redhat.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
|
2026-03-01 23:44:57 +00:00 |
|
Ma Jian
|
90805ff464
|
[CI/Build] CPU release supports both of AVX2 and AVX512 (#35466)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
Co-authored-by: jiang1.li <jiang1.li@intel.com>
|
2026-02-28 04:35:21 +00:00 |
|
Lucas Wilkinson
|
bb85929aa6
|
[BugFix] Fix Python 3.13 FlashMLA import error (#34548)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
|
2026-02-15 20:09:18 -08:00 |
|
Maryam Tahhan
|
f07a128413
|
[CPU][ARM] Add ARM BF16 cross-compilation support and improve documen… (#33079)
Signed-off-by: Maryam Tahhan <mtahhan@redhat.com>
Co-authored-by: Li, Jiang <jiang1.li@intel.com>
|
2026-02-15 06:33:08 -08:00 |
|
Lucas Wilkinson
|
c7914d30f9
|
Reapply [Attention][FA3] Update FA3 to include new swizzle optimization (#34043)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2026-02-11 07:07:56 -08:00 |
|
Andrey Talman
|
f97ca67176
|
[Release 2.10] Update to Torch 2.10 - final release (#30525)
|
2026-02-08 13:51:09 -08:00 |
|
Luka Govedič
|
e3bf79ffa0
|
Revert "[Attention][FA3] Update FA3 to include new swizzle optimization" (#33841)
|
2026-02-04 19:54:27 -08:00 |
|
R3hankhan
|
4dffc5e044
|
[CPU] Split attention dispatch by head_dim alignment (#32161)
Signed-off-by: Rehan Khan <Rehan.Khan7@ibm.com>
|
2026-02-03 19:37:15 -08:00 |
|
Lucas Wilkinson
|
2267cb1cfd
|
[Attention][FA3] Update FA3 to include new swizzle optimization (#23465)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2026-02-03 08:08:47 -08:00 |
|
Maryam Tahhan
|
203d0bc0c2
|
[CPU] Improve CPU Docker build (#30953)
Signed-off-by: Maryam Tahhan <mtahhan@redhat.com>
Co-authored-by: Li, Jiang <jiang1.li@intel.com>
|
2026-01-24 17:08:24 +00:00 |
|
Fadi Arafeh
|
744ef30484
|
[CPU Backend] [Perf] Accelerate tensor-parallel/data-parallel inference across NUMA domains on Arm (#32792)
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com>
|
2026-01-22 18:55:23 +00:00 |
|
Lucas Wilkinson
|
889722f3bf
|
[FlashMLA] Update FlashMLA to expose new arguments (#32810)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2026-01-21 22:02:39 -07:00 |
|
Lucas Wilkinson
|
b4f64e5b02
|
Update FlashMLA (#32491)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2026-01-21 13:03:37 +08:00 |
|
Lucas Wilkinson
|
be6a81f31b
|
[chore] Update FA commit (#30460)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2026-01-07 23:24:18 -08:00 |
|
Roberto L. Castro
|
fdcc5176be
|
[BugFix] Fix architecture flags to prevent issues on SM103 (#31150)
Signed-off-by: LopezCastroRoberto <robertol.c510@gmail.com>
|
2026-01-05 20:11:35 +00:00 |
|
Li, Jiang
|
e3ab93c896
|
[CPU] Refactor CPU fused MOE (#30531)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
|
2025-12-18 14:36:49 +08:00 |
|
Shengqi Chen
|
511e81e7c9
|
[BUILD] use sm_100f when compiling flashmla to fix support on sm103 (#30705)
Signed-off-by: Shengqi Chen <harry-chen@outlook.com>
|
2025-12-15 14:48:01 -08:00 |
|
Fadi Arafeh
|
f355ad5412
|
[CPU][FIX] Fix build failures on Arm CPUs with torch nightly (#30481)
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com>
|
2025-12-12 02:09:25 +00:00 |
|
Radu Salavat
|
180345807f
|
[CMake][Build]: Remove unused ACL CMake env variables (#30339)
Signed-off-by: Radu Salavat <radu.salavat@arm.com>
|
2025-12-10 04:27:19 +00:00 |
|
Ralf Gommers
|
7c1ed45848
|
[CI/Build]: make it possible to build with a free-threaded interpreter (#29241)
Signed-off-by: Ralf Gommers <ralf.gommers@gmail.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2025-11-28 15:21:46 -08:00 |
|
Lucas Wilkinson
|
c68c7b403d
|
[BugFix] Fix missing symbol triggering FA2 fallback on Hopper (#29107)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2025-11-21 13:58:32 -08:00 |
|
Matthew Bonanni
|
4c23690f43
|
[Attention] FlashAttention ViT support, make default backend (#28763)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
|
2025-11-18 20:06:21 -08:00 |
|
Li, Jiang
|
20852c8f4c
|
[CPU] Refactor CPU WNA16 (#28826)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
|
2025-11-19 10:32:00 +08:00 |
|
Varun Sundar Rabindranath
|
9912b8ccb8
|
[Build] Add OpenAI triton_kernels (#28788)
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
|
2025-11-18 16:45:20 -08:00 |
|
Matthew Bonanni
|
8cc40f8992
|
[Attention] Bump FA for removed method (#28429)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2025-11-14 09:13:37 -08:00 |
|
Akash kaothalkar
|
86d15bfd8d
|
[Hardware][PowerPC] Fix fp16 compilation error for Power in cpu attention backend and bump oneDNN version (#28535)
Signed-off-by: Akash Kaothalkar <akash.kaothalkar@ibm.com>
Co-authored-by: Akash Kaothalkar <akash.kaothalkar@ibm.com>
|
2025-11-13 13:32:21 +00:00 |
|
Radu Salavat
|
d44fbbab0e
|
[build][cmake]: Bundle static ACL and torch libgomp for CPU extension builds (#28059)
Signed-off-by: Radu Salavat <radu.salavat@arm.com>
|
2025-11-13 05:43:08 +00:00 |
|
Li, Jiang
|
7f829be7d3
|
[CPU] Refactor CPU attention backend (#27954)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
|
2025-11-12 09:43:06 +08:00 |
|
Jonas M. Kübler
|
9c84ca8293
|
[FA/Chore] Bump FA version for FP8 two-level accumulation (#27889)
Signed-off-by: Jonas Kuebler <kuebj@amazon.com>
Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
|
2025-11-10 12:06:04 -08:00 |
|
Abolfazl Shahbazi
|
d15afc1fd0
|
Refactor CPU/GPU extension targets for CMake build (#28026)
Signed-off-by: Abolfazl Shahbazi <12436063+ashahba@users.noreply.github.com>
|
2025-11-08 14:17:35 +08:00 |
|
Fadi Arafeh
|
a663f6ae64
|
[cpu][perf] Fix low CPU utilization with VLLM_CPU_OMP_THREADS_BIND on AArch64 (#27415)
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com>
|
2025-10-27 11:14:55 +00:00 |
|
Matthew Bonanni
|
b4fda58a2d
|
[MLA] Bump FlashMLA (#27354)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
|
2025-10-22 15:48:37 -07:00 |
|
Tao He
|
250fb1b8ea
|
[Bugfix] fixes the decoding metadata of dense mla's fp8 kvcache. (#27144)
Signed-off-by: Tao He <linzhu.ht@alibaba-inc.com>
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Co-authored-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2025-10-21 18:27:03 +00:00 |
|
Fadi Arafeh
|
163965d183
|
[cpu] Dispatch un-quantized linear to oneDNN/ACL by default for AArch64 (#27183)
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com>
Co-authored-by: Michael Yang <Michael.Yang@arm.com>
|
2025-10-21 02:02:58 +00:00 |
|
Lucas Wilkinson
|
9f020f4f31
|
[BugFix] Fix failing gemma-3-1b-it test: test_lm_eval_accuracy_v1_engine[google/gemma-3-1b-it] (#27111)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2025-10-18 12:44:39 -06:00 |
|
zhrrr
|
e471d7ca7e
|
[CI/Build][Bugfix] fix qutlass cmake error when set QUTLASS_SRC_DIR (#26773)
Signed-off-by: izhuhaoran <izhuhaoran@qq.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
|
2025-10-15 04:09:44 +00:00 |
|
ihb2032
|
086609de64
|
fix(nix): Allow local oneDNN path to fix vLLM CPU build failure (#26401)
Signed-off-by: lyd1992 <liuyudong@iscas.ac.cn>
Signed-off-by: ihb2032 <1355790728@qq.com>
|
2025-10-11 09:12:16 +00:00 |
|
Nishidha Panpaliya
|
8f8474fbe3
|
[CI/Build] Fix ppc64le CPU build and tests (#22443)
Signed-off-by: Nishidha Panpaliya <nishidha.panpaliya@partner.ibm.com>
|
2025-10-11 13:04:42 +08:00 |
|
Roberto L. Castro
|
96ad65b7fe
|
[Transform] [Quantization] Add QuTLASS support to vLLM (#24440)
Signed-off-by: LopezCastroRoberto <roberto.lopez.castro@udc.es>
Signed-off-by: Roberto L. Castro <38211239+LopezCastroRoberto@users.noreply.github.com>
Signed-off-by: Andrei Panferov <andrei@panferov.org>
Co-authored-by: Andrei Panferov <andrei@panferov.org>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
|
2025-10-10 09:43:40 -07:00 |
|
Ming Yang
|
3b736e1c38
|
[Attention][DCP] Support DCP with query length > 1 (MTP) with FA3 (#25049)
Signed-off-by: Ming Yang <minos.future@gmail.com>
|
2025-10-09 08:06:29 -07:00 |
|
Harry Mellor
|
d6953beb91
|
Convert formatting to use ruff instead of yapf + isort (#26247)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-10-05 07:06:22 -07:00 |
|
Fadi Arafeh
|
9705fba7b7
|
[cpu][perf] Accelerate unquantized-linear for AArch64 through oneDNN/ACL and weight prepack (#25948)
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com>
Co-authored-by: Li, Jiang <jiang1.li@intel.com>
|
2025-10-04 12:16:38 +08:00 |
|
Lucas Wilkinson
|
418d111f8c
|
[FA/Chore] Bump vllm-flash-attention (#25537)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2025-10-02 11:06:14 -04:00 |
|
Johnny
|
5234dc7451
|
[NVIDIA] Blackwell Family (#24673)
Signed-off-by: Johnny <johnnynuca14@gmail.com>
Signed-off-by: johnnynunez <johnnynuca14@gmail.com>
Signed-off-by: Johnny <johnnync13@gmail.com>
Signed-off-by: Salvatore Cena <cena@cenas.it>
Co-authored-by: Aidyn-A <31858918+Aidyn-A@users.noreply.github.com>
Co-authored-by: Salvatore Cena <cena@cenas.it>
|
2025-10-01 10:50:54 -07:00 |
|