biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Fadi Arafeh	34d317dcec	[CPU][UX][Perf] Enable tcmalloc by default (#37607 ) Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com>	2026-03-25 20:39:57 +08:00
grYe99	7ac48fd357	[Model] Add AutoWeightsLoader support for jais (#38074 ) Signed-off-by: grYe99 <guorongye99@gmail.com> Co-authored-by: grYe99 <guorongye99@gmail.com>	2026-03-25 12:38:40 +00:00
Harry Mellor	d6bb2a9d9a	Fix Plamo 2/3 & LFM2 for Transformers v5 (#38090 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-25 12:29:49 +00:00
Harry Mellor	1e673a43ce	Better weight tying check for multimodal models (#38035 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-25 12:07:23 +00:00
Andreas Karatzas	04417ecd5f	[ROCm][CI] Rename filepath test to point to correct file (#38102 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-03-25 20:05:46 +08:00
R0CKSTAR	242c93f744	[Docs] Adds vllm-musa to custom_op.md (#37840 ) Signed-off-by: Xiaodong Ye <yeahdongcn@gmail.com>	2026-03-25 11:54:36 +00:00
Matthias Gehre	a889b7f584	[Bugfix] Pass drafter quant_config to ParallelLMHead in Eagle3 (#37280 ) Signed-off-by: Matthias Gehre <matthias.gehre@amd.com>	2026-03-25 11:42:58 +00:00
Harry Mellor	ba2910f73a	Fix offline mode test for Transformers v5 (#38095 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-25 11:39:48 +00:00
Andreas Karatzas	f262a62aa1	[ROCm][CI] Fix flaky Cohere/OpenAI embedding parity test (#37616 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-03-25 10:55:51 +00:00
Andreas Karatzas	9ac2fcafbb	[CI] Fix realtime WebSocket timeout deadlock and unhandled model validation errors (#37483 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-03-25 11:24:33 +01:00
Kunshang Ji	e9ae3f8077	[Hardware][XPU] Align memory usage with cuda on xpu (#37029 ) Signed-off-by: Kunshang Ji <jikunshang95@gmail.com> Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>	2026-03-25 18:14:29 +08:00
Andreas Karatzas	04cec4f927	[ROCm][CI] Increase OpenAPI schema test timeouts (#38088 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-03-25 18:06:58 +08:00
Kunshang Ji	14771f7150	[XPU] support MLA model on Intel GPU (#37143 ) Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>	2026-03-25 17:43:42 +08:00
Gregory Shtrasberg	189ddefbfd	[ROCm] Attention selector reordering (#36702 ) Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com> Signed-off-by: Micah Williamson <micah.williamson@amd.com> Co-authored-by: Micah Williamson <micah.williamson@amd.com>	2026-03-25 17:42:56 +08:00
Chauncey	09c3dc9186	[Revert] Remove CUDA torch fallbacks for fp8_mqa_logits/fp8_paged_mqa_logits_torch function (#37968 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>	2026-03-25 06:19:37 +00:00
vllmellm	42e9547976	[ROCm][Test] Fix ROCM_AITER_UNIFIED_ATTN attn+quant fusion test (#37640 ) Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>	2026-03-25 05:06:15 +00:00
Chauncey	a32783bb35	[Bugfix] Fix IndexError when accessing prev_tool_call_arr in OpenAIToolParser (#37958 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>	2026-03-25 12:06:21 +08:00
Baorun (Lauren) Mu	9d0351c91d	[Docs] Add Encoder (ViT) CUDA Graphs section to CUDA Graphs design doc (#37914 ) Signed-off-by: Baorun Mu <bmu@nvidia.com>	2026-03-24 19:53:24 -07:00
Artem Perevedentsev	a93a53f8a1	[Performance] Auto-enable prefetch on NFS with RAM guard (#37673 ) Signed-off-by: Artem Perevedentsev <aperevedents@nvidia.com>	2026-03-24 17:31:14 -07:00
Andreas Karatzas	679c6a3ecc	[Bugfix][ROCm][MoE] Fix mxfp4 oracle regressions from #37128 (#37787 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-03-25 08:17:33 +08:00
Andreas Karatzas	8bbb7c7f20	[ROCm][CI][PD] Add Hybrid SSM integration tests to CI (#37924 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-03-25 07:58:39 +08:00
Kevin H. Luu	af945615b5	[release] Move the rest of release jobs to release queue (#38044 ) Signed-off-by: khluu <khluu000@gmail.com>	2026-03-24 16:40:58 -07:00
Terry Gao	82580b10ac	[Perf] Disable inductor runtime asserts by default for serving perfor… (#37485 ) Signed-off-by: tianrengao <terrygao87@gmail.com> Co-authored-by: Tianren Gao <tianren@fb.com>	2026-03-24 19:37:51 -04:00
Netanel Haber	a0d487b2e1	nano_nemotron_vl: suppress readonly torch.from_numpy() warning in image and video resize paths (#37903 ) Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>	2026-03-24 23:25:56 +00:00
Junhao	b73b5b0629	Make microbatch optimization (DBO) work with general models (#37926 ) Signed-off-by: Junhao Li <junhao@ubicloud.com>	2026-03-24 14:40:08 -07:00
Michael Goin	0f0e03890e	[UX] Add flashinfer-cubin as CUDA default dep (#37233 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2026-03-24 14:13:08 -07:00
Woosuk Kwon	4b53740d7f	[MRV2] Fix for DS v3.2 (#38030 ) Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>	2026-03-24 14:03:24 -07:00
Nick Hill	4e824d1c83	[Model Runner V2][Minor] Simplify PP logic (#38031 ) Signed-off-by: Nick Hill <nickhill123@gmail.com>	2026-03-24 13:57:17 -07:00
amey asgaonkar	0c1809c806	Add Ubuntu 24.04 support for Docker builds (#35386 ) Signed-off-by: aasgaonkar <aasgaonkar@nvidia.com>	2026-03-24 13:34:44 -07:00
liangel-02	8c47fdfdb1	[FlexAttention] allow custom mask mod (#37692 ) Signed-off-by: Angel Li <liangel@meta.com>	2026-03-24 16:03:24 -04:00
Javier De Jesus	54b0578ada	[Bugfix] Pass hf_token through config loading paths for gated model support (#37920 ) Signed-off-by: javierdejesusda <javier.dejesusj9@gmail.com>	2026-03-24 15:22:05 -04:00
Richard Zou	89f572dbc0	[BugFix] fix VLLM_USE_STANDALONE_COMPILE=0 (#38015 ) Signed-off-by: Richard Zou <zou3519@gmail.com>	2026-03-24 19:08:26 +00:00
Richard Zou	71a4a2fbd0	[BugFix] Fix order of compile logging (#38012 ) Signed-off-by: Richard Zou <zou3519@gmail.com>	2026-03-24 18:58:18 +00:00
Nick Cao	935c46dd9b	[Model] Add Granite 4.0 1B speech to supported models (#38019 ) Signed-off-by: Nick Cao <ncao@redhat.com>	2026-03-24 18:23:41 +00:00
Willy Hardy	057fc94cbd	[Bugfix] Fix structured output crash on CPU due to pin_memory=True (#37706 ) Signed-off-by: Willy Hardy <whardy@redhat.com> Signed-off-by: Will Hardy <whardy@redhat.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-24 17:44:17 +00:00
Vineeta Tiwari	b58c5f28aa	docs: fix broken offline inference paths in documentation (#37998 ) Signed-off-by: Vineeta Tiwari <vineeta.tiwari2@ibm.com> Signed-off-by: Vineeta Tiwari <vineetatiwari2000@gmail.com> Co-authored-by: Vineeta Tiwari <vineeta.tiwari2@ibm.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2026-03-24 17:35:14 +00:00
Ming Yang	c07e2ca6e0	Fix Mamba state corruption from referencing stale block table entries (#37728 ) (#37728 ) (#37728 )	2026-03-24 10:29:59 -07:00
Dhruv Singal	4df5fa7439	[Bugfix] Force continuous usage stats when CLI override is enabled (#37923 ) Signed-off-by: Your Name <you@example.com> Co-authored-by: Your Name <you@example.com> Co-authored-by: OpenCode <noreply@openai.com>	2026-03-24 10:29:50 -07:00
sihao_li	a5416bc52e	[XPU] Support Intel XPU hardware information collection in usage stats (#37964 ) Signed-off-by: sihao.li <sihao.li@intel.com>	2026-03-24 10:29:17 -07:00
Harry Mellor	b3601da6e7	[Mypy] Fix mypy for `vllm/model_executor` (except `vllm/model_executor/layers`) (#37904 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-24 17:14:01 +00:00
Dan Blanaru	dc78c2c933	[Core] add option to schedule requests based on full ISL (#37307 ) Signed-off-by: Dan Blanaru <48605845+DanBlanaru@users.noreply.github.com> Co-authored-by: Claude <noreply@anthropic.com>	2026-03-24 13:01:12 -04:00
Sungjae Lee	4731884796	[Feature] limit thinking tokens (hard limit) (#20859 ) Signed-off-by: Sungjae Lee <33976427+llsj14@users.noreply.github.com> Signed-off-by: Sungjae Lee <sung-jae.lee@navercorp.com> Signed-off-by: Chauncey <chaunceyjiang@gmail.com> Co-authored-by: Chauncey <chaunceyjiang@gmail.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-24 09:53:07 -07:00
Harry Mellor	8de5261e69	Update new contributor message (#37999 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-24 16:01:41 +00:00
wang.yuqi	1b6cb920e6	[Deprecate] Deprecate pooling multi task support. (#37956 ) Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io> Signed-off-by: wang.yuqi <noooop@126.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2026-03-24 14:07:47 +00:00
Li, Jiang	352b90c4a4	[Bugfix] Add replacement of _compute_slot_mapping_kernel on CPU (#37987 ) Signed-off-by: jiang1.li <jiang1.li@intel.com>	2026-03-24 07:00:20 -07:00
Sage	1c0aabdeb0	[Bugfix] Suppress spurious CPU KV cache warning in `launch render` (#37911 ) Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>	2026-03-24 12:36:18 +00:00
Ilya Markov	14acf429ac	[EPLB] Remove main waits in case of slow EPLB (#36271 ) Signed-off-by: ilmarkov <markovilya197@gmail.com>	2026-03-24 11:50:44 +00:00
Harry Mellor	ce57fd5557	[Docs] Fix build (#37991 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-24 03:20:49 -07:00
Flora Feng	2e67fa756d	Fix tool_parser_cls type annotation from Callable to type[ToolParser] (#37957 ) Signed-off-by: sfeng33 <4florafeng@gmail.com>	2026-03-23 22:58:27 -07:00
Ronen Schaffer	e3c6c10cad	[KV Offload] Refactor CPU offloading: pluggable CachePolicy, remove Backend abstraction, restructure into `cpu/` package (#37874 ) Signed-off-by: Ronen Schaffer <ronen.schaffer@ibm.com>	2026-03-24 07:02:51 +02:00

1 2 3 4 5 ...

15229 Commits