Nicolò Lucchesi
|
4d022cbc75
|
[TPU][V1] Make --disable_chunked_mm_input mandatory for serving MM models (#16483)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2025-04-11 17:06:14 +00:00 |
|
Tomasz Zielinski
|
34b2cf3b33
|
[Hardware][Intel-Gaudi] Multi-step scheduling implementation for HPU (#12779)
Signed-off-by: Tomasz Zielinski <tomasz.zielinski@intel.com>
|
2025-04-11 07:38:36 -07:00 |
|
Nicolò Lucchesi
|
3cc9af88ff
|
[TPU][V1] Disable per-request seed/Generator (#16172)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2025-04-10 17:05:44 -04:00 |
|
Michael Goin
|
baada0e737
|
[Bugfix][TPU] Fix TPU validate_request (#16369)
Signed-off-by: Michael Goin <mgoin64@gmail.com>
|
2025-04-10 12:55:12 +08:00 |
|
Joe Runde
|
cb391d85dc
|
[Hardware] add platform-specific request validation api (#16291)
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
|
2025-04-09 12:50:01 -07:00 |
|
yihong
|
04149cce27
|
[BugFix] fix some typos found by typos. (#16314)
Signed-off-by: yihong0618 <zouzou0208@gmail.com>
|
2025-04-09 03:43:59 -07:00 |
|
Shanshan Shen
|
e9ba99f296
|
[V1][Structured Output] Add supports_structured_output() method to Platform (#16148)
Signed-off-by: shen-shanshan <467638484@qq.com>
|
2025-04-07 11:06:24 +00:00 |
|
Ilya Markov
|
ef608c37a7
|
[Distributed] [ROCM] Fix custom allreduce enable checks (#16010)
Signed-off-by: ilmarkov <imarkov@redhat.com>
Co-authored-by: ilmarkov <imarkov@redhat.com>
|
2025-04-04 09:39:08 -07:00 |
|
Li, Jiang
|
2386803f2a
|
[CPU] Change default block_size for CPU backend (#16002)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
|
2025-04-04 09:39:05 -07:00 |
|
Aleksandr Malyshev
|
57a810db9c
|
[ROCM][V0] PA kennel selection when no sliding window provided (#15982)
Signed-off-by: Aleksandr Malyshev <maleksan@amd.com>
Co-authored-by: Aleksandr Malyshev <maleksan@amd.com>
|
2025-04-03 05:28:44 +00:00 |
|
Aleksandr Malyshev
|
e73ff24e31
|
[ROCM][KERNEL] Paged attention for V1 (#15720)
Signed-off-by: Aleksandr Malyshev <maleksan@amd.com>
Signed-off-by: root <root@banff-cyxtera-s65-4.amd.com>
Co-authored-by: Aleksandr Malyshev <maleksan@amd.com>
Co-authored-by: root <root@banff-cyxtera-s65-4.amd.com>
|
2025-04-02 19:48:00 -07:00 |
|
Ilya Markov
|
b7b7676d67
|
[Distributed] Add custom allreduce support for ROCM (#14125)
Signed-off-by: ilmarkov <imarkov@redhat.com>
Co-authored-by: ilmarkov <imarkov@redhat.com>
|
2025-03-31 22:49:12 -07:00 |
|
Kebe
|
4e0f6076be
|
[Bugfix] Fix failure to launch in Tensor Parallel TP mode on macOS. (#14948)
Signed-off-by: Kebe <mail@kebe7jun.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
|
2025-03-28 10:13:41 +08:00 |
|
Joe Runde
|
5f063a80bd
|
[bugfix] add supports_v1 platform interface (#15417)
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
|
2025-03-25 15:00:32 -04:00 |
|
Thien Tran
|
4f044b1d67
|
[Kernel][CPU] CPU MLA (#14744)
Signed-off-by: Thien Tran <gau.nernst@yahoo.com.sg>
|
2025-03-25 09:34:59 +00:00 |
|
Cyrus Leung
|
6dd55af6c9
|
[Doc] Update docs on handling OOM (#15357)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: Roger Wang <ywang@roblox.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
|
2025-03-24 14:29:34 -07:00 |
|
Lucas Wilkinson
|
dccf535f8e
|
[V1] Enable V1 Fp8 cache for FA3 in the oracle (#15191)
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2025-03-23 15:07:04 -07:00 |
|
Russell Bryant
|
b877031d80
|
Remove openvino support in favor of external plugin (#15339)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2025-03-22 14:06:39 -07:00 |
|
Isotr0py
|
f8a08cb90d
|
[V1] Enable Triton(ROCm) Attention backend for Nvidia GPUs (#14071)
Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-03-21 03:14:19 +00:00 |
|
Richard Liu
|
a8f12a63fd
|
Fix env vars for running Ray distributed backend on GKE (#15166)
Signed-off-by: Richard Liu <ricliu@google.com>
|
2025-03-20 14:59:33 +00:00 |
|
Mickaël Seznec
|
a597a57595
|
[Attention] Flash Attention 3 - fp8 (#14570)
Signed-off-by: Mickael Seznec <mickael@mistral.ai>
|
2025-03-20 01:14:20 -04:00 |
|
Yan Ma
|
9b87a579aa
|
[Misc][XPU] Use None as device capacity for XPU (#14932)
Signed-off-by: yan ma <yan.ma@intel.com>
|
2025-03-17 01:22:14 -07:00 |
|
Lucas Wilkinson
|
1e799b7ec1
|
[BugFix] Fix MLA + V1 + TP==1 causing reinitialization of cuda context (#14910)
|
2025-03-17 03:35:37 +00:00 |
|
Li, Jiang
|
a2ae496589
|
[CPU] Support FP8 KV cache (#14741)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
|
2025-03-14 22:07:36 -07:00 |
|
Alexander Matveev
|
7888e1d0a3
|
[V1] TPU - Enable prefix caching by default (#14773)
|
2025-03-13 20:40:05 -07:00 |
|
Siyuan Liu
|
1bc3b739c4
|
[V1][TPU] Add assertion on multi-step-scheduler (#14707)
Signed-off-by: Siyuan Liu <lsiyuan@google.com>
|
2025-03-12 21:37:58 -07:00 |
|
Li, Jiang
|
ff47aab056
|
[CPU] Upgrade CPU backend to torch-2.6 (#13381)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
Co-authored-by: Isotr0py <2037008807@qq.com>
|
2025-03-12 10:41:13 +00:00 |
|
Jeff Daily
|
a1c8f3796c
|
dynamic distpatch of fp8 kernels (#14245)
Signed-off-by: Jeff Daily <jeff.daily@amd.com>
|
2025-03-11 10:54:56 -04:00 |
|
gnovack
|
d6123170d5
|
[Neuron] Add Neuron device communicator for vLLM v1 (#14085)
|
2025-03-10 18:37:04 -07:00 |
|
Harry Mellor
|
3b352a2f92
|
Correct capitalisation: VLLM -> vLLM (#14562)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-03-10 16:36:21 +00:00 |
|
Lucas Wilkinson
|
b0d541947a
|
[Attention] Default to FlashMLA backend for MLA (#14451)
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
|
2025-03-08 18:18:39 -08:00 |
|
youkaichao
|
6eaf93020d
|
[platforms] improve rocm debugging info (#14257)
|
2025-03-04 21:32:18 -08:00 |
|
Tyler Michael Smith
|
72c62eae5f
|
[V1] EP/TP MoE + DP Attention (#13931)
|
2025-03-04 21:27:26 -08:00 |
|
Michael Goin
|
6247bae6c6
|
[Bugfix] Restrict MacOS CPU detection (#14210)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-03-04 22:25:27 +08:00 |
|
youkaichao
|
ac65bc92df
|
[platform] add debug logging during inferring the device type (#14195)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2025-03-04 18:39:16 +08:00 |
|
Cody Yu
|
989f4f430c
|
[Misc] Remove lru_cache in NvmlCudaPlatform (#14156)
Signed-off-by: Cody Yu <hao.yu.cody@gmail.com>
|
2025-03-04 11:09:34 +08:00 |
|
Mengqing Cao
|
b87c21fc89
|
[Misc][Platform] Move use allgather to platform (#14010)
Signed-off-by: Mengqing Cao <cmq0113@163.com>
|
2025-03-03 15:40:04 +08:00 |
|
Woosuk Kwon
|
3b5567a209
|
[V1][Minor] Do not print attn backend twice (#13985)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-03-01 07:09:14 +00:00 |
|
Lucas Wilkinson
|
2e94b9cfbb
|
[Attention] Flash MLA for V1 (#13867)
Signed-off-by: Yang Chen <yangche@fb.com>
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Co-authored-by: Yang Chen <yangche@fb.com>
|
2025-02-27 23:03:41 +00:00 |
|
Yang Chen
|
58d1b2aa77
|
[Attention] MLA support for V1 (#13789)
Signed-off-by: Yang Chen <yangche@fb.com>
|
2025-02-27 13:14:17 -05:00 |
|
Lucas Wilkinson
|
f95903909f
|
[Kernel] FlashMLA integration (#13747)
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2025-02-27 10:35:08 +08:00 |
|
cjackal
|
51010a1807
|
[Misc] set single whitespace between log sentences (#13771)
Signed-off-by: cjackal <44624812+cjackal@users.noreply.github.com>
|
2025-02-25 10:26:12 +08:00 |
|
Alex Brooks
|
9621667874
|
[Misc] Warn if the vLLM version can't be retrieved (#13501)
Signed-off-by: Alex-Brooks <Alex.brooks@ibm.com>
|
2025-02-20 06:24:48 +00:00 |
|
Cyrus Leung
|
435b502a6e
|
[ROCm] Make amdsmi import optional for other platforms (#13460)
|
2025-02-18 03:15:56 -08:00 |
|
Divakar Verma
|
7c7adf81fc
|
[ROCm] fix get_device_name for rocm (#13438)
Signed-off-by: Divakar Verma <divakar.verma@amd.com>
|
2025-02-18 04:07:12 +00:00 |
|
Yan Ma
|
30513d1cb6
|
[Bugfix] fix xpu communicator (#13368)
Signed-off-by: yan ma <yan.ma@intel.com>
|
2025-02-17 20:59:18 +08:00 |
|
Mengqing Cao
|
238dfc8ac3
|
[MISC] tiny fixes (#13378)
|
2025-02-17 00:57:13 -08:00 |
|
Isotr0py
|
d67cc21b78
|
[Bugfix][Platform][CPU] Fix cuda platform detection on CPU backend edge case (#13358)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-02-16 18:55:27 +00:00 |
|
youkaichao
|
a0231b7c25
|
[platform] add base class for communicators (#13208)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2025-02-16 22:14:22 +08:00 |
|
Lily Liu
|
80f63a3966
|
[V1][Spec Decode] Ngram Spec Decode (#12193)
Signed-off-by: LiuXiaoxuanPKU <lilyliupku@gmail.com>
|
2025-02-15 18:05:11 -08:00 |
|