Commit Graph

144 Commits

Author SHA1 Message Date
Nicolò Lucchesi
4d022cbc75 [TPU][V1] Make --disable_chunked_mm_input mandatory for serving MM models (#16483)
Signed-off-by: NickLucche <nlucches@redhat.com>
2025-04-11 17:06:14 +00:00
Tomasz Zielinski
34b2cf3b33 [Hardware][Intel-Gaudi] Multi-step scheduling implementation for HPU (#12779)
Signed-off-by: Tomasz Zielinski <tomasz.zielinski@intel.com>
2025-04-11 07:38:36 -07:00
Nicolò Lucchesi
3cc9af88ff [TPU][V1] Disable per-request seed/Generator (#16172)
Signed-off-by: NickLucche <nlucches@redhat.com>
2025-04-10 17:05:44 -04:00
Michael Goin
baada0e737 [Bugfix][TPU] Fix TPU validate_request (#16369)
Signed-off-by: Michael Goin <mgoin64@gmail.com>
2025-04-10 12:55:12 +08:00
Joe Runde
cb391d85dc [Hardware] add platform-specific request validation api (#16291)
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
2025-04-09 12:50:01 -07:00
yihong
04149cce27 [BugFix] fix some typos found by typos. (#16314)
Signed-off-by: yihong0618 <zouzou0208@gmail.com>
2025-04-09 03:43:59 -07:00
Shanshan Shen
e9ba99f296 [V1][Structured Output] Add supports_structured_output() method to Platform (#16148)
Signed-off-by: shen-shanshan <467638484@qq.com>
2025-04-07 11:06:24 +00:00
Ilya Markov
ef608c37a7 [Distributed] [ROCM] Fix custom allreduce enable checks (#16010)
Signed-off-by: ilmarkov <imarkov@redhat.com>
Co-authored-by: ilmarkov <imarkov@redhat.com>
2025-04-04 09:39:08 -07:00
Li, Jiang
2386803f2a [CPU] Change default block_size for CPU backend (#16002)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
2025-04-04 09:39:05 -07:00
Aleksandr Malyshev
57a810db9c [ROCM][V0] PA kennel selection when no sliding window provided (#15982)
Signed-off-by: Aleksandr Malyshev <maleksan@amd.com>
Co-authored-by: Aleksandr Malyshev <maleksan@amd.com>
2025-04-03 05:28:44 +00:00
Aleksandr Malyshev
e73ff24e31 [ROCM][KERNEL] Paged attention for V1 (#15720)
Signed-off-by: Aleksandr Malyshev <maleksan@amd.com>
Signed-off-by: root <root@banff-cyxtera-s65-4.amd.com>
Co-authored-by: Aleksandr Malyshev <maleksan@amd.com>
Co-authored-by: root <root@banff-cyxtera-s65-4.amd.com>
2025-04-02 19:48:00 -07:00
Ilya Markov
b7b7676d67 [Distributed] Add custom allreduce support for ROCM (#14125)
Signed-off-by: ilmarkov <imarkov@redhat.com>
Co-authored-by: ilmarkov <imarkov@redhat.com>
2025-03-31 22:49:12 -07:00
Kebe
4e0f6076be [Bugfix] Fix failure to launch in Tensor Parallel TP mode on macOS. (#14948)
Signed-off-by: Kebe <mail@kebe7jun.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
2025-03-28 10:13:41 +08:00
Joe Runde
5f063a80bd [bugfix] add supports_v1 platform interface (#15417)
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
2025-03-25 15:00:32 -04:00
Thien Tran
4f044b1d67 [Kernel][CPU] CPU MLA (#14744)
Signed-off-by: Thien Tran <gau.nernst@yahoo.com.sg>
2025-03-25 09:34:59 +00:00
Cyrus Leung
6dd55af6c9 [Doc] Update docs on handling OOM (#15357)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: Roger Wang <ywang@roblox.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
2025-03-24 14:29:34 -07:00
Lucas Wilkinson
dccf535f8e [V1] Enable V1 Fp8 cache for FA3 in the oracle (#15191)
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2025-03-23 15:07:04 -07:00
Russell Bryant
b877031d80 Remove openvino support in favor of external plugin (#15339)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
2025-03-22 14:06:39 -07:00
Isotr0py
f8a08cb90d [V1] Enable Triton(ROCm) Attention backend for Nvidia GPUs (#14071)
Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-03-21 03:14:19 +00:00
Richard Liu
a8f12a63fd Fix env vars for running Ray distributed backend on GKE (#15166)
Signed-off-by: Richard Liu <ricliu@google.com>
2025-03-20 14:59:33 +00:00
Mickaël Seznec
a597a57595 [Attention] Flash Attention 3 - fp8 (#14570)
Signed-off-by: Mickael Seznec <mickael@mistral.ai>
2025-03-20 01:14:20 -04:00
Yan Ma
9b87a579aa [Misc][XPU] Use None as device capacity for XPU (#14932)
Signed-off-by: yan ma <yan.ma@intel.com>
2025-03-17 01:22:14 -07:00
Lucas Wilkinson
1e799b7ec1 [BugFix] Fix MLA + V1 + TP==1 causing reinitialization of cuda context (#14910) 2025-03-17 03:35:37 +00:00
Li, Jiang
a2ae496589 [CPU] Support FP8 KV cache (#14741)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
2025-03-14 22:07:36 -07:00
Alexander Matveev
7888e1d0a3 [V1] TPU - Enable prefix caching by default (#14773) 2025-03-13 20:40:05 -07:00
Siyuan Liu
1bc3b739c4 [V1][TPU] Add assertion on multi-step-scheduler (#14707)
Signed-off-by: Siyuan Liu <lsiyuan@google.com>
2025-03-12 21:37:58 -07:00
Li, Jiang
ff47aab056 [CPU] Upgrade CPU backend to torch-2.6 (#13381)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
Co-authored-by: Isotr0py <2037008807@qq.com>
2025-03-12 10:41:13 +00:00
Jeff Daily
a1c8f3796c dynamic distpatch of fp8 kernels (#14245)
Signed-off-by: Jeff Daily <jeff.daily@amd.com>
2025-03-11 10:54:56 -04:00
gnovack
d6123170d5 [Neuron] Add Neuron device communicator for vLLM v1 (#14085) 2025-03-10 18:37:04 -07:00
Harry Mellor
3b352a2f92 Correct capitalisation: VLLM -> vLLM (#14562)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-03-10 16:36:21 +00:00
Lucas Wilkinson
b0d541947a [Attention] Default to FlashMLA backend for MLA (#14451)
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
2025-03-08 18:18:39 -08:00
youkaichao
6eaf93020d [platforms] improve rocm debugging info (#14257) 2025-03-04 21:32:18 -08:00
Tyler Michael Smith
72c62eae5f [V1] EP/TP MoE + DP Attention (#13931) 2025-03-04 21:27:26 -08:00
Michael Goin
6247bae6c6 [Bugfix] Restrict MacOS CPU detection (#14210)
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-03-04 22:25:27 +08:00
youkaichao
ac65bc92df [platform] add debug logging during inferring the device type (#14195)
Signed-off-by: youkaichao <youkaichao@gmail.com>
2025-03-04 18:39:16 +08:00
Cody Yu
989f4f430c [Misc] Remove lru_cache in NvmlCudaPlatform (#14156)
Signed-off-by: Cody Yu <hao.yu.cody@gmail.com>
2025-03-04 11:09:34 +08:00
Mengqing Cao
b87c21fc89 [Misc][Platform] Move use allgather to platform (#14010)
Signed-off-by: Mengqing Cao <cmq0113@163.com>
2025-03-03 15:40:04 +08:00
Woosuk Kwon
3b5567a209 [V1][Minor] Do not print attn backend twice (#13985)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-03-01 07:09:14 +00:00
Lucas Wilkinson
2e94b9cfbb [Attention] Flash MLA for V1 (#13867)
Signed-off-by: Yang Chen <yangche@fb.com>
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Co-authored-by: Yang Chen <yangche@fb.com>
2025-02-27 23:03:41 +00:00
Yang Chen
58d1b2aa77 [Attention] MLA support for V1 (#13789)
Signed-off-by: Yang Chen <yangche@fb.com>
2025-02-27 13:14:17 -05:00
Lucas Wilkinson
f95903909f [Kernel] FlashMLA integration (#13747)
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2025-02-27 10:35:08 +08:00
cjackal
51010a1807 [Misc] set single whitespace between log sentences (#13771)
Signed-off-by: cjackal <44624812+cjackal@users.noreply.github.com>
2025-02-25 10:26:12 +08:00
Alex Brooks
9621667874 [Misc] Warn if the vLLM version can't be retrieved (#13501)
Signed-off-by: Alex-Brooks <Alex.brooks@ibm.com>
2025-02-20 06:24:48 +00:00
Cyrus Leung
435b502a6e [ROCm] Make amdsmi import optional for other platforms (#13460) 2025-02-18 03:15:56 -08:00
Divakar Verma
7c7adf81fc [ROCm] fix get_device_name for rocm (#13438)
Signed-off-by: Divakar Verma <divakar.verma@amd.com>
2025-02-18 04:07:12 +00:00
Yan Ma
30513d1cb6 [Bugfix] fix xpu communicator (#13368)
Signed-off-by: yan ma <yan.ma@intel.com>
2025-02-17 20:59:18 +08:00
Mengqing Cao
238dfc8ac3 [MISC] tiny fixes (#13378) 2025-02-17 00:57:13 -08:00
Isotr0py
d67cc21b78 [Bugfix][Platform][CPU] Fix cuda platform detection on CPU backend edge case (#13358)
Signed-off-by: Isotr0py <2037008807@qq.com>
2025-02-16 18:55:27 +00:00
youkaichao
a0231b7c25 [platform] add base class for communicators (#13208)
Signed-off-by: youkaichao <youkaichao@gmail.com>
2025-02-16 22:14:22 +08:00
Lily Liu
80f63a3966 [V1][Spec Decode] Ngram Spec Decode (#12193)
Signed-off-by: LiuXiaoxuanPKU <lilyliupku@gmail.com>
2025-02-15 18:05:11 -08:00