Commit Graph

156 Commits

Author SHA1 Message Date
Cyrus Leung
8c017b3490 [Model] Always use Transformers backend for PaliGemma and Gemma3-MM (#26715)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-10-17 05:03:35 +00:00
wangxiyuan
8f4b313c37 [Misc] rename torch_dtype to dtype (#26695)
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-10-15 12:11:48 +00:00
wangxiyuan
db1764e4e0 [Platform] allow platform to init dp group (#22243)
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-10-15 02:32:17 -07:00
Harry Mellor
8fcaaf6a16 Update Optional[x] -> x | None and Union[x, y] to x | y (#26633)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-10-12 09:51:31 -07:00
Matt
de253d63b7 [Hardware][AMD] Enable FlexAttention backend on ROCm (#26439)
Signed-off-by: Matthew Wong <Matthew.Wong2@amd.com>
2025-10-09 06:20:18 +00:00
Gregory Shtrasberg
f231e5bc21 [ROCm] Split AITER unified attention into its own backend (#25507)
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
2025-10-06 22:49:23 +00:00
Harry Mellor
d6953beb91 Convert formatting to use ruff instead of yapf + isort (#26247)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-10-05 07:06:22 -07:00
TJian
9c5ee91b2a [ROCm] [VL] [Bugfix] Fix vit flash attn dispatcher logic for ROCm (#26104)
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
2025-10-02 22:34:53 -07:00
Matthew Bonanni
2aaa423842 [Attention] Move Backend enum into registry (#25893)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
2025-10-02 20:32:24 -07:00
Yongye Zhu
fa7e254a7f [New Model] DeepSeek-V3.2 (Rebased to Main) (#25896)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: NickLucche <nlucches@redhat.com>
Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>
Signed-off-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com>
Signed-off-by: Lucia Fang <fanglu@meta.com>
Co-authored-by: Chen Zhang <zhangch99@outlook.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: Lucas Wilkinson <lwilkins@redhat.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
Co-authored-by: yewentao256 <zhyanwentao@126.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
Co-authored-by: Lucia Fang <116399278+luccafong@users.noreply.github.com>
Co-authored-by: Lucia Fang <fanglu@meta.com>
Co-authored-by: NickLucche <nlucches@redhat.com>
Co-authored-by: Siyuan Fu <siyuanf@nvidia.com>
Co-authored-by: Matthew Bonanni <mbonanni@redhat.com>
Co-authored-by: Xiaozhu Meng <mxz297@gmail.com>
Co-authored-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com>
2025-09-30 17:14:41 +08:00
Aaron Pham
6a113d9aed [V0 Deprecation] Remove vllm.worker and update according imports (#25901) 2025-09-29 23:26:11 +00:00
Matthew Bonanni
3468f17ebe [V0 deprecation] Remove _VLLM_V1 suffixes from attention backend names (#25489)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com>
2025-09-25 17:37:50 +00:00
Gregory Shtrasberg
487745ff49 [ROCm][Bugfix] Only enable +rms_norm based on aiter if not explicitly disabled (#25275)
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
2025-09-24 11:24:39 -04:00
Juan Villamizar
bde2a1a8a4 [ROCm] Small functional changes for gptoss (#25201)
Signed-off-by: jpvillam <jpvillam@amd.com>
Co-authored-by: jpvillam <jpvillam@amd.com>
2025-09-23 23:39:50 +00:00
Isotr0py
6fa78d8f23 [V0 deprecation] Remove platform v1 controling interface (#25410)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-09-23 01:48:12 +00:00
Burkhard Ringlein
175811e3b5 [V1][Attention] Split triton_attn in triton-only and rocm specific backends (#24648)
Signed-off-by: Burkhard Ringlein <ngl@zurich.ibm.com>
2025-09-22 15:20:28 +00:00
Yizhou
b6f01bd9a7 refactor: abstract graph mode support into platform interface (#25161)
Signed-off-by: Yizhou Liu <liu_yizhou@outlook.com>
2025-09-22 10:22:29 +00:00
Woosuk Kwon
bc6e542d9f Remove V0 attention backends (#25351)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-09-21 16:03:28 -07:00
Woosuk Kwon
0ff8ebb2d7 [V0 Deprecation] Remove async_output_proc, preemption mode, delay factor (#25334)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-09-21 08:52:32 -07:00
Woosuk Kwon
5801e49776 [V0 Deprecation] Remove MQLLMEngine (#25019)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai>
2025-09-16 21:29:27 -07:00
Wenlong Wang
72fc8aa412 [Multi Modal] Add FA3 in VIT (#24347)
Signed-off-by: wwl2755 <wangwenlong2755@gmail.com>
2025-09-12 21:27:24 +08:00
Mengqing Cao
4f6593b058 [HybridKVCache][Platform] Add support_hybrid_kv_cache for platform (#24646)
Signed-off-by: MengqingCao <cmq0113@163.com>
2025-09-11 21:47:58 +08:00
Lifans
d6069887c6 [rocm] enable torchao quantization for rocm (#24400)
Signed-off-by: Lifan Shen <lifans@meta.com>
2025-09-10 06:16:21 -07:00
vllmellm
7c195d43da [ROCm][Bugfix] Fix Aiter RMSNorm (#23412)
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
2025-09-10 21:08:03 +08:00
Kunshang Ji
fce10dbed5 [XPU] Add xpu torch.compile support (#22609)
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
2025-08-27 05:33:27 +00:00
Chaojun Zhang
8a044754bd [XPU] Delay BF16 check to worker init for spawn compatibility (#22979)
Signed-off-by: chzhang <chaojun.zhang@intel.com>
2025-08-25 13:09:26 -07:00
Daifeng Li
fa78de9dc3 Quantization: support FP4 quantized models on AMD CDNA2/CDNA3 GPUs (#22527)
Signed-off-by: feng <fengli1702@gmail.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
2025-08-22 20:53:21 -06:00
Matthew Bonanni
19fe1a0510 [Kernel] Add FP8 support with FlashMLA backend (#22668)
Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com>
2025-08-22 02:26:32 +00:00
fhl2000
74f441f4b5 [Core] Allow full cudagraph with separate attention routines and orthogonal to compilation, add support for FA2 and FlashInfer (#20059)
Signed-off-by: fhl <2410591650@qq.com>
Signed-off-by: fhl2000 <63384265+fhl2000@users.noreply.github.com>
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Lucas Wilkinson <lwilkins@redhat.com>
Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
2025-08-15 10:01:39 -04:00
Woosuk Kwon
71683ca6f6 [V0 Deprecation] Remove multi-step scheduling (#22138)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai>
2025-08-12 20:18:39 -07:00
Yongye Zhu
007dd90859 [gpt-oss] Enable gpt-oss on ampere (#22714)
Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>
2025-08-12 03:21:44 -07:00
Woosuk Kwon
98a3a81024 [ROCm] Add attention sink to use_rocm_custom_paged_attention (#22329)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>

Co-authored-by: LiuXiaoxuanPKU <lilyliupku@gmail.com>
Co-authored-by: simon-mo <xmo@berkeley.edu>
Co-authored-by: Chen Zhang <zhangch99@outlook.com>
Co-authored-by: Hongxia Yang <62075498+hongxiayang@users.noreply.github.com>
Co-authored-by: Minseok Lee <47620120+minseokl@users.noreply.github.com>
Co-authored-by: Yongye Zhu <zyy1102000@gmail.com>
2025-08-05 23:30:38 -07:00
vllmellm
d3a6f2120b [FEAT][ROCm] Enable running Flash Attention as ViT attn backend for Qwen-VL models on ROCm platform. (#22069)
Signed-off-by: tjtanaavllm <tunjian.tan@amd.com>
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
Co-authored-by: tjtanaavllm <tunjian.tan@amd.com>
2025-08-01 23:53:18 -07:00
Konrad Zawora
c17231e827 Fix kv_cache_dtype handling for out-of-tree HPU plugin (#21302)
Signed-off-by: Konrad Zawora <kzawora@habana.ai>
Signed-off-by: Chendi.Xue <chendi.xue@intel.com>
Co-authored-by: Chendi.Xue <chendi.xue@intel.com>
2025-07-21 23:35:14 -07:00
Woosuk Kwon
dd572c0ab3 [V0 Deprecation] Remove V0 Spec Decode workers (#21152)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-07-18 21:47:50 -07:00
Woosuk Kwon
54cf1cae62 [Misc] Do not print async output warning for v1 (#21151)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-07-17 21:57:02 -07:00
Gregory Shtrasberg
5b8366b61a [ROCm][Regression] Remove tensor creation that harms performance on ROCm (#20741)
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
2025-07-10 09:22:23 -07:00
Kunshang Ji
0b407479ef [misc]refactor Platform.set_device method (#20262)
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
2025-07-09 01:39:47 +00:00
Yang Yang
6e2c19ce22 [Refactor]Abstract Platform Interface for Distributed Backend and Add xccl Support for Intel XPU (#19410)
Signed-off-by: dbyoung18 <yang5.yang@intel.com>
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
2025-07-07 04:32:32 +00:00
Liangliang Ma
a0389e0554 [UT][intel GPU] use current_platform instead of device hardcode in v1 tests (#20169)
Signed-off-by: Ma, Liangliang <liangliang.ma@intel.com>
2025-07-02 09:06:04 +08:00
TY-AMD
96453cfa83 [BugFix][V1][ROCm] Triton MLA uses V0 backend on V1 engine (#19067)
Signed-off-by: Tianyuan Wu <Tianyuan.Wu@amd.com>
2025-07-01 16:12:19 +08:00
Zzz9990
8b6e1d639c [Hardware][AMD] integrate aiter chunked prefill into vllm (#18596)
Signed-off-by: fsx950223 <fsx950223@outlook.com>
Signed-off-by: charlifu <charlifu@amd.com>
Co-authored-by: fsx950223 <fsx950223@outlook.com>
Co-authored-by: charlifu <charlifu@amd.com>
2025-06-18 08:46:51 -07:00
Charlie Fu
a44b1c951d [Feature][ROCm] Add full graph capture support for TritonAttentionBackend (#19158)
Signed-off-by: charlifu <charlifu@amd.com>
2025-06-17 17:03:06 -04:00
Simon Mo
02f0c7b220 [Misc] Add SPDX-FileCopyrightText (#19100)
Signed-off-by: simon-mo <simon.mo@hey.com>
2025-06-03 11:20:17 -07:00
Charlie Fu
306d60401d [ROCm][Kernel] Add gfx950 support for skinny gemms (#18010)
Signed-off-by: charlifu <charlifu@amd.com>
2025-05-31 07:40:05 -07:00
Gregory Shtrasberg
1b7cfd5a36 [ROCm][V0][Attention] Revert to the previous FA triton kernel (#18226)
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
2025-05-29 12:13:18 -04:00
Mengqing Cao
d781930f90 [Platform][Dist] Make torch distributed process group extendable (#18763)
Signed-off-by: Mengqing Cao <cmq0113@163.com>
2025-05-28 10:52:34 +00:00
fxmarty-amd
794ae1f551 [rocm] Fix wrong attention log (#18764)
Signed-off-by: Felix Marty <felmarty@amd.com>
2025-05-27 19:45:41 -07:00
Ning Xie
60cad94b86 [Hardware] correct method signatures for HPU,ROCm,XPU (#18551)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
2025-05-22 22:31:59 -07:00
Mengqing Cao
f8d2cc5f55 [Compile][Platform] Make PiecewiseBackend pluggable and extendable (#18076)
Signed-off-by: Mengqing Cao <cmq0113@163.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
2025-05-22 12:11:53 -07:00