Commit Graph

14411 Commits

Author SHA1 Message Date
hallerite
b8401cde0e add regression test (#35834)
Signed-off-by: hallerite <git@hallerite.com>
2026-03-03 07:32:15 +00:00
TJian
5dfc5abe94 [ROCm] [Release] Change the package from aiter to amd-aiter (#35198)
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
2026-03-02 23:13:39 -08:00
lin-shh
8fa68a8ce4 Fix TYPE_CHECKING stub defaults in envs.py to match actual runtime defaults (#35645) 2026-03-02 21:59:43 -08:00
lin-shh
35a6f0bfe2 [Misc] Fix typos in comments: explict→explicit, paramaters→parameters (#35648) 2026-03-02 21:59:14 -08:00
Taneem Ibrahim
3a6cbf16e2 [MISC] Removed unused function find_all_indices() from tool_parsers/utils.py (#35683)
Signed-off-by: Taneem Ibrahim <taneem.ibrahim@gmail.com>
2026-03-03 13:58:42 +08:00
Lucas Wilkinson
f44d1ddc8c [BugFix] Fix cmake based incremental install (wrong vllm install dir) (#35773)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
2026-03-02 21:58:16 -08:00
Cyrus Leung
48a54c1e0d [CI/Build] Trigger processor tests on registry update (#35824)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2026-03-03 13:55:57 +08:00
Micah Williamson
8b9e8b7454 [ROCm][CI] Fix Assertion Logic For test_gpt_oss (#35806)
Signed-off-by: Micah Williamson <micah.williamson@amd.com>
2026-03-03 05:08:04 +00:00
Wentao Ye
c21d0039ec [Refactor] Fix maxsim cuda platform and add cli to control it (#35427)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2026-03-03 12:48:31 +08:00
Isotr0py
7d8bbe6f42 [CI/Build] Automatically patch video metadata for multimodal processor test (#35822)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2026-03-03 04:27:45 +00:00
aykoppol
25e02647c2 [Core] Add optional flags to check for repetitive token patterns in engine output (#35451)
Signed-off-by: aykoppol <aykoppol+git@gmail.com>
2026-03-03 12:23:25 +08:00
Woosuk Kwon
a0a5178ab4 [Model Runner V2] Use ModelState.prepare_attn() for cuda graph capture [5/N] (#35774)
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>
2026-03-02 20:06:27 -08:00
Isotr0py
8ea8ba275e [V0 deprecation] Remove Swin model (#35821)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2026-03-02 20:03:41 -08:00
Woosuk Kwon
4f85bae9d6 [Docs][Model Runner V2] Add Design Docs (#35819)
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>
2026-03-02 19:58:14 -08:00
Andy Lo
0a7165fd71 [ModelRunnerV2] Rename sampler functions and variables for clarity (#35459)
Signed-off-by: Andy Lo <andy@mistral.ai>
2026-03-02 19:48:56 -08:00
Robert Shaw
6521ccf286 [CI] Temporarily Disable Nightly Failures (#35770)
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Signed-off-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
2026-03-03 01:49:13 +00:00
Martin Vit
8ebd872f50 [Tool Parser] Fix Qwen3Coder streaming parameter loss with speculative decode (#35615)
Signed-off-by: Martin Vit <martin@voipmonitor.org>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-03 09:40:37 +08:00
zhrrr
168ee03e1c [Model Runner V2][Perf] align dummy_run tokens to uniform decode for dp cudagraph (#35376)
Signed-off-by: zhuhaoran <zhuhaoran.zhr@alibaba-inc.com>
2026-03-02 17:10:47 -08:00
liuzhenwei
9dd656f0ea [XPU][NIXL] Add GPUDirect RDMA support for XPU (#35270)
Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com>
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
2026-03-03 08:42:49 +08:00
Jakub Zakrzewski
c8b678e53e [Model] Add support for nvidia/llama-nemotron-rerank-vl-1b-v2 (#35735)
Signed-off-by: Jakub Zakrzewski <jzakrzewski@nvidia.com>
2026-03-03 08:32:14 +08:00
Andreas Karatzas
18c29c746b [ROCm][CI] Fix backslash-continuation in pytest marker re-quoting and treat exit code 5 as success (#35798)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-03-02 16:07:51 -08:00
Hanjie Qiu
96fc09503a [All Reduce] Change default backend of Flashinfer All Reduce to trtllm (#35793)
Signed-off-by: hjjq <hanjieq@nvidia.com>
2026-03-02 18:57:38 -05:00
Roger Wang
1b82b433fc [Bugfix] Fix MM processor test for Qwen3.5 (#35797)
Signed-off-by: Roger Wang <hey@rogerw.io>
2026-03-02 23:05:08 +00:00
Robert Shaw
9319044ee9 [MoE][Perf] Wrap DSV3 QKVAProj GEMM in custom op for torch.compile (#35751)
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
2026-03-02 23:03:49 +00:00
Boyuan Feng
c42dc402c1 clean unused cudagraph_batch_sizes (#35552)
Signed-off-by: Boyuan Feng <boyuan@meta.com>
2026-03-02 22:00:16 +00:00
Ye (Charlotte) Qi
fa6a6be519 [Bugfix] Fix missing sequence_lengths in qwen3_omni_moe_thinker (#35741)
Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com>
2026-03-02 21:11:56 +00:00
Aaron Hao
cad21918e3 [BUG] Fix rlhf_async example (#35788)
Signed-off-by: ahao-anyscale <ahao@anyscale.com>
2026-03-02 20:36:40 +00:00
Jeffrey Wang
53700bf49b [ci] Add Ray compatibility check informational CI job (#34672)
Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>
2026-03-02 12:06:16 -08:00
Yashwant Bezawada
a13d8c03c9 [KVConnector] Auto-downgrade to PIECEWISE cudagraph mode for layerwise async ops (#31057)
Signed-off-by: Yashwant Bezawada <yashwant_b@me.com>
2026-03-02 15:04:47 -05:00
Fynn Schmitt-Ulms
9433acb8df [Spec Decode] Add hidden states extraction system (#33736)
Signed-off-by: Fynn Schmitt-Ulms <fschmitt@redhat.com>
2026-03-02 14:29:09 -05:00
Richard Zou
d1a6e96d9e [torch.compile] Improve cold and warm start compile tests (#35709)
Signed-off-by: Richard Zou <zou3519@gmail.com>
2026-03-02 19:27:06 +00:00
CSWYF3634076
2a9e3347e9 [BugFix][Model]Fix the garbled code in Ernie4.5-VL caused by fast_moe_cold_start (#35587)
Signed-off-by: wangyafeng <wangyafeng@baidu.com>
2026-03-02 18:56:33 +00:00
Isotr0py
cc0d565f40 [CI/Build] Enable Qwen3.5 tests on CI (#35763)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2026-03-02 17:43:53 +00:00
Patryk Wolsza
358e4d5ba7 [CI][HPU] Pin vllm commit compatible with vllm-gaudi - HPU tests (#35307)
Signed-off-by: PatrykWo <patryk.wolsza@intel.com>
2026-03-02 17:02:26 +00:00
Cyrus Leung
792a74b973 [Doc] Improve UX of --enable-log-requests (#35723)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2026-03-02 08:24:09 -08:00
Turner Jabbour
4034c3d32e [Core] Move test utility to test file (#35672)
Signed-off-by: Turner Jabbour <doubleujabbour@gmail.com>
2026-03-02 10:56:03 -05:00
Martin Hickey
7560d674c9 [CI] Fix mypy for vllm/device allocator (#35518)
Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2026-03-02 15:53:18 +00:00
ElizaWszola
d9c7730877 [Performance] Extract kv update ops from MLA attention backends (#34627)
Signed-off-by: ElizaWszola <ewszola@redhat.com>
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Di Wu <dw2761@nyu.edu>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
2026-03-02 10:43:19 -05:00
Runkai Tao
ada4f4fadd [Fix Bug]num_active_loras always equals to zero (#34119)
Signed-off-by: Runkai Tao <rt572@physics.rutgers.edu>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
2026-03-02 23:17:46 +08:00
Harry Mellor
7e9149d9a9 [Docs] Add breadcrumbs for better UX (#35749)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2026-03-02 14:31:54 +00:00
Martin Hickey
87c98b0236 [MyPy][BugFix] Check profiler is assigned before calling start() on it (#35505)
Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2026-03-02 13:23:42 +00:00
Tyler Michael Smith
de7dd634b9 Fix unresolved-import errors when using Astral's ty by removing src.root (#35681)
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
2026-03-02 10:26:47 +00:00
Chauncey
9a87b0578f [Feat] Supports Anthropic Messages count_tokens API (#35588)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
2026-03-02 09:48:54 +00:00
wangxiyuan
510bc9e1df [Misc] Cleanup useless current_platform import (#35715)
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2026-03-02 09:36:54 +00:00
Charles Ashby
cbd361fd46 [CPU][Distributed] Fix Enable _CPUSHMDistributed only when TP/PP ranks share the same SHM group name (#34169)
Signed-off-by: Charles Ashby <charlesa.l@hotmail.com>
2026-03-02 09:34:35 +00:00
Nicolò Lucchesi
c212202d93 [Misc] Bound NIXL upper bound version (#35495)
Signed-off-by: NickLucche <nlucches@redhat.com>
2026-03-02 16:57:07 +08:00
Andreas Karatzas
ec27b36b4b [CI] Defining extended V1 e2e + engine tests (#35580)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-03-02 08:10:54 +00:00
Charlie Fu
3fd1d4ec2c [Rocm][CI] Fix LM Eval Large Models (H100) test group (#34750)
Signed-off-by: charlifu <charlifu@amd.com>
2026-03-02 07:43:38 +00:00
EdalatiAli
cb21972a97 [Kernel] Integrate SM100 MXFP8 blockscaled grouped MM and quant kernels (#34448)
Signed-off-by: EdalatiAli <aliedalati@cohere.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
2026-03-01 23:31:19 -08:00
Andreas Karatzas
c34963f138 [ROCm][CI] Disable skinny GEMMs in language model standard tests to fix non-determinism (#35152)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-03-02 15:04:18 +08:00