biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
aykoppol	25e02647c2	[Core] Add optional flags to check for repetitive token patterns in engine output (#35451 ) Signed-off-by: aykoppol <aykoppol+git@gmail.com>	2026-03-03 12:23:25 +08:00
Woosuk Kwon	a0a5178ab4	[Model Runner V2] Use ModelState.prepare_attn() for cuda graph capture [5/N] (#35774 ) Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>	2026-03-02 20:06:27 -08:00
Isotr0py	8ea8ba275e	[V0 deprecation] Remove Swin model (#35821 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2026-03-02 20:03:41 -08:00
Woosuk Kwon	4f85bae9d6	[Docs][Model Runner V2] Add Design Docs (#35819 ) Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>	2026-03-02 19:58:14 -08:00
Andy Lo	0a7165fd71	[ModelRunnerV2] Rename sampler functions and variables for clarity (#35459 ) Signed-off-by: Andy Lo <andy@mistral.ai>	2026-03-02 19:48:56 -08:00
Robert Shaw	6521ccf286	[CI] Temporarily Disable Nightly Failures (#35770 ) Signed-off-by: Robert Shaw <robshaw@redhat.com> Signed-off-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com> Co-authored-by: Robert Shaw <robshaw@redhat.com>	2026-03-03 01:49:13 +00:00
Martin Vit	8ebd872f50	[Tool Parser] Fix Qwen3Coder streaming parameter loss with speculative decode (#35615 ) Signed-off-by: Martin Vit <martin@voipmonitor.org> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-03 09:40:37 +08:00
zhrrr	168ee03e1c	[Model Runner V2][Perf] align dummy_run tokens to uniform decode for dp cudagraph (#35376 ) Signed-off-by: zhuhaoran <zhuhaoran.zhr@alibaba-inc.com>	2026-03-02 17:10:47 -08:00
liuzhenwei	9dd656f0ea	[XPU][NIXL] Add GPUDirect RDMA support for XPU (#35270 ) Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com> Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>	2026-03-03 08:42:49 +08:00
Jakub Zakrzewski	c8b678e53e	[Model] Add support for nvidia/llama-nemotron-rerank-vl-1b-v2 (#35735 ) Signed-off-by: Jakub Zakrzewski <jzakrzewski@nvidia.com>	2026-03-03 08:32:14 +08:00
Andreas Karatzas	18c29c746b	[ROCm][CI] Fix backslash-continuation in pytest marker re-quoting and treat exit code 5 as success (#35798 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-03-02 16:07:51 -08:00
Hanjie Qiu	96fc09503a	[All Reduce] Change default backend of Flashinfer All Reduce to trtllm (#35793 ) Signed-off-by: hjjq <hanjieq@nvidia.com>	2026-03-02 18:57:38 -05:00
Roger Wang	1b82b433fc	[Bugfix] Fix MM processor test for Qwen3.5 (#35797 ) Signed-off-by: Roger Wang <hey@rogerw.io>	2026-03-02 23:05:08 +00:00
Robert Shaw	9319044ee9	[MoE][Perf] Wrap DSV3 QKVAProj GEMM in custom op for torch.compile (#35751 ) Signed-off-by: Robert Shaw <robshaw@redhat.com> Co-authored-by: Robert Shaw <robshaw@redhat.com>	2026-03-02 23:03:49 +00:00
Boyuan Feng	c42dc402c1	clean unused cudagraph_batch_sizes (#35552 ) Signed-off-by: Boyuan Feng <boyuan@meta.com>	2026-03-02 22:00:16 +00:00
Ye (Charlotte) Qi	fa6a6be519	[Bugfix] Fix missing sequence_lengths in qwen3_omni_moe_thinker (#35741 ) Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com>	2026-03-02 21:11:56 +00:00
Aaron Hao	cad21918e3	[BUG] Fix rlhf_async example (#35788 ) Signed-off-by: ahao-anyscale <ahao@anyscale.com>	2026-03-02 20:36:40 +00:00
Jeffrey Wang	53700bf49b	[ci] Add Ray compatibility check informational CI job (#34672 ) Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>	2026-03-02 12:06:16 -08:00
Yashwant Bezawada	a13d8c03c9	[KVConnector] Auto-downgrade to PIECEWISE cudagraph mode for layerwise async ops (#31057 ) Signed-off-by: Yashwant Bezawada <yashwant_b@me.com>	2026-03-02 15:04:47 -05:00
Fynn Schmitt-Ulms	9433acb8df	[Spec Decode] Add hidden states extraction system (#33736 ) Signed-off-by: Fynn Schmitt-Ulms <fschmitt@redhat.com>	2026-03-02 14:29:09 -05:00
Richard Zou	d1a6e96d9e	[torch.compile] Improve cold and warm start compile tests (#35709 ) Signed-off-by: Richard Zou <zou3519@gmail.com>	2026-03-02 19:27:06 +00:00
CSWYF3634076	2a9e3347e9	[BugFix][Model]Fix the garbled code in Ernie4.5-VL caused by fast_moe_cold_start (#35587 ) Signed-off-by: wangyafeng <wangyafeng@baidu.com>	2026-03-02 18:56:33 +00:00
Isotr0py	cc0d565f40	[CI/Build] Enable Qwen3.5 tests on CI (#35763 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2026-03-02 17:43:53 +00:00
Patryk Wolsza	358e4d5ba7	[CI][HPU] Pin vllm commit compatible with vllm-gaudi - HPU tests (#35307 ) Signed-off-by: PatrykWo <patryk.wolsza@intel.com>	2026-03-02 17:02:26 +00:00
Cyrus Leung	792a74b973	[Doc] Improve UX of `--enable-log-requests` (#35723 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-03-02 08:24:09 -08:00
Turner Jabbour	4034c3d32e	[Core] Move test utility to test file (#35672 ) Signed-off-by: Turner Jabbour <doubleujabbour@gmail.com>	2026-03-02 10:56:03 -05:00
Martin Hickey	7560d674c9	[CI] Fix mypy for vllm/device allocator (#35518 ) Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-02 15:53:18 +00:00
ElizaWszola	d9c7730877	[Performance] Extract kv update ops from MLA attention backends (#34627 ) Signed-off-by: ElizaWszola <ewszola@redhat.com> Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by: Di Wu <dw2761@nyu.edu> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2026-03-02 10:43:19 -05:00
Runkai Tao	ada4f4fadd	[Fix Bug]`num_active_loras` always equals to zero (#34119 ) Signed-off-by: Runkai Tao <rt572@physics.rutgers.edu> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>	2026-03-02 23:17:46 +08:00
Harry Mellor	7e9149d9a9	[Docs] Add breadcrumbs for better UX (#35749 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-02 14:31:54 +00:00
Martin Hickey	87c98b0236	[MyPy][BugFix] Check profiler is assigned before calling start() on it (#35505 ) Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-02 13:23:42 +00:00
Tyler Michael Smith	de7dd634b9	Fix unresolved-import errors when using Astral's ty by removing src.root (#35681 ) Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>	2026-03-02 10:26:47 +00:00
Chauncey	9a87b0578f	[Feat] Supports Anthropic Messages count_tokens API (#35588 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>	2026-03-02 09:48:54 +00:00
wangxiyuan	510bc9e1df	[Misc] Cleanup useless `current_platform` import (#35715 ) Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2026-03-02 09:36:54 +00:00
Charles Ashby	cbd361fd46	[CPU][Distributed] Fix Enable _CPUSHMDistributed only when TP/PP ranks share the same SHM group name (#34169 ) Signed-off-by: Charles Ashby <charlesa.l@hotmail.com>	2026-03-02 09:34:35 +00:00
Nicolò Lucchesi	c212202d93	[Misc] Bound NIXL upper bound version (#35495 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2026-03-02 16:57:07 +08:00
Andreas Karatzas	ec27b36b4b	[CI] Defining extended V1 e2e + engine tests (#35580 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-03-02 08:10:54 +00:00
Charlie Fu	3fd1d4ec2c	[Rocm][CI] Fix LM Eval Large Models (H100) test group (#34750 ) Signed-off-by: charlifu <charlifu@amd.com>	2026-03-02 07:43:38 +00:00
EdalatiAli	cb21972a97	[Kernel] Integrate SM100 MXFP8 blockscaled grouped MM and quant kernels (#34448 ) Signed-off-by: EdalatiAli <aliedalati@cohere.com> Signed-off-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>	2026-03-01 23:31:19 -08:00
Andreas Karatzas	c34963f138	[ROCm][CI] Disable skinny GEMMs in language model standard tests to fix non-determinism (#35152 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-03-02 15:04:18 +08:00
Hongxia Yang	f26650d649	[ROCm] add amd-quark package in requirements for rocm to use quantized models (#35658 ) Signed-off-by: Hongxia Yang <hongxiay.yang@amd.com> Co-authored-by: Hongxia Yang <hongxiay.yang@amd.com>	2026-03-02 06:02:43 +00:00
Kunshang Ji	92f5d0f070	[XPU] fix mxfp4 activation type (#35691 ) Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>	2026-03-02 11:48:39 +08:00
Jesse Cai	a60985b07e	Fix deprecated v1 config tests (#35327 ) Signed-off-by: Jesse Cai <jessecai@fb.com>	2026-03-01 20:32:03 -05:00
Lucas Wilkinson	8b5014d3dd	[Attention] FA4 integration (#32974 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com> Co-authored-by: Matthew Bonanni <mbonanni@redhat.com> Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>	2026-03-01 23:44:57 +00:00
zhanqiuhu	57a96e26c9	Revert "[Bugfix] Disable TRTLLM attention with KV transfer enabled (#33192 )" (#34832 ) Signed-off-by: Zhanqiu Hu <zh338@cornell.edu>	2026-03-01 22:32:37 +00:00
Richard Zou	e82fbeec7b	[torch.compile] Undo the fast_moe_cold_start hack in torch>=2.11 (#35475 ) Signed-off-by: Richard Zou <zou3519@gmail.com>	2026-03-01 21:44:22 +00:00
haosdent	6290470843	[Bugfix] Fix dtype mismatch in RMSNormGated.forward_native() during torch.compile (#35256 ) Signed-off-by: haosdent <haosdent@gmail.com>	2026-03-01 15:14:46 -05:00
Woosuk Kwon	72f4d16262	[Model Runner V2] Use block table apis for capture inputs (#35671 ) Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>	2026-03-01 10:31:13 -08:00
Seungho Yoon	5a435507d8	fix(mxfp4): return is_monolithic=False when LoRA is enabled for Triton backend (#35382 ) Signed-off-by: Seungho Yoon <yoonsnowdev@gmail.com>	2026-03-01 09:59:30 -05:00
Taneem Ibrahim	59d7af9c6c	[MISC] Fixing a null reference by removing parallel_utils from mypy EXCLUDE (#35630 ) Signed-off-by: Taneem Ibrahim <taneem.ibrahim@gmail.com>	2026-03-01 09:26:44 -05:00

1 2 3 4 5 ...

14451 Commits