biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Mark McLoughlin	e38817fadb	[Core][KV Connector] Remove use of num_cached_tokens in error handling (#38096 ) Signed-off-by: Mark McLoughlin <markmc@redhat.com>	2026-03-25 18:20:48 +00:00
Nick Hill	72cad44d3c	[Frontend] Move APIServerProcessManager target server fn (#38115 ) Signed-off-by: Nick Hill <nickhill123@gmail.com>	2026-03-25 18:14:41 +00:00
Cyrus Leung	ba2f0acc2d	[Misc] Reorganize inputs (#35182 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-03-25 10:22:54 -07:00
Yongye Zhu	678b3c99e8	[MoE Kernel] Flashinfer nvfp4 cutedsl moe kernel integration (#38050 )	2026-03-25 10:16:40 -07:00
mikaylagawarecki	bf4cc9ed2d	[2/n] Migrate per_token_group_quant to torch stable ABI (#36058 ) Signed-off-by: Mikayla Gawarecki <mikaylagawarecki@gmail.com>	2026-03-25 10:15:13 -07:00
Ben Browning	1ac2ef2e53	[CI/Docs] Improve aarch64/DGX Spark support for dev setup (#38057 ) Signed-off-by: Ben Browning <bbrownin@redhat.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-25 09:24:42 -07:00
Richard Zou	6e37c46b35	[compile] Add some more startup tests for top models (#38046 ) Signed-off-by: Richard Zou <zou3519@gmail.com>	2026-03-25 12:02:22 -04:00
Wentao Ye	1bf2ddd0ee	[Refactor] Rename `WAITING_FOR_FSM` to `WAITING_FOR_STRUCTURED_OUTPUT_GRAMMAR` (#38048 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2026-03-25 11:41:44 -04:00
Necofish	e7221180e1	[Kernel] Optimize SM120 CUTLASS blockwise FP8 GEMM (#37970 ) Signed-off-by: Necofish <liuxiangyang@mail.ustc.edu.cn> Co-authored-by: Michael Goin <mgoin64@gmail.com>	2026-03-25 08:20:04 -07:00
RobTand	4a76ad12e0	[Bugfix] Preserve CUDA arch suffix (a/f) for SM12x — fixes NVFP4 NaN on desktop Blackwell (#37725 ) Signed-off-by: Rob Tand <robert.tand@icloud.com> Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>	2026-03-25 08:18:25 -07:00
Wentao Ye	d7e93e13fb	[Feature] EPLB Support for GPU Model Runner v2 (#37488 ) Signed-off-by: yewentao256 <zhyanwentao@126.com> Signed-off-by: Woosuk Kwon <woosuk@inferact.ai> Co-authored-by: Woosuk Kwon <woosuk@inferact.ai>	2026-03-25 08:16:39 -07:00
Andrii Skliar	cd7643015e	[Feature] Support per-draft-model MoE backend via `--speculative-config` (#37880 ) Signed-off-by: Andrii Skliar <askliar@nvidia.com> Signed-off-by: [Andrii Skliar] <askliar@nvidia.com> Co-authored-by: Andrii Skliar <askliar@nvidia.com>	2026-03-25 14:31:52 +00:00
Ben Browning	a1a2566447	[Docs] Add guide for editing agent instruction files (#37819 ) Signed-off-by: Ben Browning <bbrownin@redhat.com>	2026-03-25 13:54:09 +00:00
yjz	b745e8b5d3	[KVTransfer][Mooncake] Add heterogeneous TP support for disaggregated P/D in MooncakeConnector (#36869 ) Signed-off-by: JianDan0212 <zhangyj0212@gmail.com>	2026-03-25 14:24:07 +01:00
Harry Mellor	d215d1efca	[Mypy] Better fixes for the `mypy` issues in `vllm/config` (#37902 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-25 06:14:43 -07:00
Fadi Arafeh	34d317dcec	[CPU][UX][Perf] Enable tcmalloc by default (#37607 ) Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com>	2026-03-25 20:39:57 +08:00
grYe99	7ac48fd357	[Model] Add AutoWeightsLoader support for jais (#38074 ) Signed-off-by: grYe99 <guorongye99@gmail.com> Co-authored-by: grYe99 <guorongye99@gmail.com>	2026-03-25 12:38:40 +00:00
Harry Mellor	d6bb2a9d9a	Fix Plamo 2/3 & LFM2 for Transformers v5 (#38090 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-25 12:29:49 +00:00
Harry Mellor	1e673a43ce	Better weight tying check for multimodal models (#38035 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-25 12:07:23 +00:00
Andreas Karatzas	04417ecd5f	[ROCm][CI] Rename filepath test to point to correct file (#38102 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-03-25 20:05:46 +08:00
R0CKSTAR	242c93f744	[Docs] Adds vllm-musa to custom_op.md (#37840 ) Signed-off-by: Xiaodong Ye <yeahdongcn@gmail.com>	2026-03-25 11:54:36 +00:00
Matthias Gehre	a889b7f584	[Bugfix] Pass drafter quant_config to ParallelLMHead in Eagle3 (#37280 ) Signed-off-by: Matthias Gehre <matthias.gehre@amd.com>	2026-03-25 11:42:58 +00:00
Harry Mellor	ba2910f73a	Fix offline mode test for Transformers v5 (#38095 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-25 11:39:48 +00:00
Andreas Karatzas	f262a62aa1	[ROCm][CI] Fix flaky Cohere/OpenAI embedding parity test (#37616 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-03-25 10:55:51 +00:00
Andreas Karatzas	9ac2fcafbb	[CI] Fix realtime WebSocket timeout deadlock and unhandled model validation errors (#37483 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-03-25 11:24:33 +01:00
Kunshang Ji	e9ae3f8077	[Hardware][XPU] Align memory usage with cuda on xpu (#37029 ) Signed-off-by: Kunshang Ji <jikunshang95@gmail.com> Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>	2026-03-25 18:14:29 +08:00
Andreas Karatzas	04cec4f927	[ROCm][CI] Increase OpenAPI schema test timeouts (#38088 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-03-25 18:06:58 +08:00
Kunshang Ji	14771f7150	[XPU] support MLA model on Intel GPU (#37143 ) Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>	2026-03-25 17:43:42 +08:00
Gregory Shtrasberg	189ddefbfd	[ROCm] Attention selector reordering (#36702 ) Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com> Signed-off-by: Micah Williamson <micah.williamson@amd.com> Co-authored-by: Micah Williamson <micah.williamson@amd.com>	2026-03-25 17:42:56 +08:00
Chauncey	09c3dc9186	[Revert] Remove CUDA torch fallbacks for fp8_mqa_logits/fp8_paged_mqa_logits_torch function (#37968 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>	2026-03-25 06:19:37 +00:00
vllmellm	42e9547976	[ROCm][Test] Fix ROCM_AITER_UNIFIED_ATTN attn+quant fusion test (#37640 ) Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>	2026-03-25 05:06:15 +00:00
Chauncey	a32783bb35	[Bugfix] Fix IndexError when accessing prev_tool_call_arr in OpenAIToolParser (#37958 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>	2026-03-25 12:06:21 +08:00
Baorun (Lauren) Mu	9d0351c91d	[Docs] Add Encoder (ViT) CUDA Graphs section to CUDA Graphs design doc (#37914 ) Signed-off-by: Baorun Mu <bmu@nvidia.com>	2026-03-24 19:53:24 -07:00
Artem Perevedentsev	a93a53f8a1	[Performance] Auto-enable prefetch on NFS with RAM guard (#37673 ) Signed-off-by: Artem Perevedentsev <aperevedents@nvidia.com>	2026-03-24 17:31:14 -07:00
Andreas Karatzas	679c6a3ecc	[Bugfix][ROCm][MoE] Fix mxfp4 oracle regressions from #37128 (#37787 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-03-25 08:17:33 +08:00
Andreas Karatzas	8bbb7c7f20	[ROCm][CI][PD] Add Hybrid SSM integration tests to CI (#37924 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-03-25 07:58:39 +08:00
Kevin H. Luu	af945615b5	[release] Move the rest of release jobs to release queue (#38044 ) Signed-off-by: khluu <khluu000@gmail.com>	2026-03-24 16:40:58 -07:00
Terry Gao	82580b10ac	[Perf] Disable inductor runtime asserts by default for serving perfor… (#37485 ) Signed-off-by: tianrengao <terrygao87@gmail.com> Co-authored-by: Tianren Gao <tianren@fb.com>	2026-03-24 19:37:51 -04:00
Netanel Haber	a0d487b2e1	nano_nemotron_vl: suppress readonly torch.from_numpy() warning in image and video resize paths (#37903 ) Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>	2026-03-24 23:25:56 +00:00
Junhao	b73b5b0629	Make microbatch optimization (DBO) work with general models (#37926 ) Signed-off-by: Junhao Li <junhao@ubicloud.com>	2026-03-24 14:40:08 -07:00
Michael Goin	0f0e03890e	[UX] Add flashinfer-cubin as CUDA default dep (#37233 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2026-03-24 14:13:08 -07:00
Woosuk Kwon	4b53740d7f	[MRV2] Fix for DS v3.2 (#38030 ) Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>	2026-03-24 14:03:24 -07:00
Nick Hill	4e824d1c83	[Model Runner V2][Minor] Simplify PP logic (#38031 ) Signed-off-by: Nick Hill <nickhill123@gmail.com>	2026-03-24 13:57:17 -07:00
amey asgaonkar	0c1809c806	Add Ubuntu 24.04 support for Docker builds (#35386 ) Signed-off-by: aasgaonkar <aasgaonkar@nvidia.com>	2026-03-24 13:34:44 -07:00
liangel-02	8c47fdfdb1	[FlexAttention] allow custom mask mod (#37692 ) Signed-off-by: Angel Li <liangel@meta.com>	2026-03-24 16:03:24 -04:00
Javier De Jesus	54b0578ada	[Bugfix] Pass hf_token through config loading paths for gated model support (#37920 ) Signed-off-by: javierdejesusda <javier.dejesusj9@gmail.com>	2026-03-24 15:22:05 -04:00
Richard Zou	89f572dbc0	[BugFix] fix VLLM_USE_STANDALONE_COMPILE=0 (#38015 ) Signed-off-by: Richard Zou <zou3519@gmail.com>	2026-03-24 19:08:26 +00:00
Richard Zou	71a4a2fbd0	[BugFix] Fix order of compile logging (#38012 ) Signed-off-by: Richard Zou <zou3519@gmail.com>	2026-03-24 18:58:18 +00:00
Nick Cao	935c46dd9b	[Model] Add Granite 4.0 1B speech to supported models (#38019 ) Signed-off-by: Nick Cao <ncao@redhat.com>	2026-03-24 18:23:41 +00:00
Willy Hardy	057fc94cbd	[Bugfix] Fix structured output crash on CPU due to pin_memory=True (#37706 ) Signed-off-by: Willy Hardy <whardy@redhat.com> Signed-off-by: Will Hardy <whardy@redhat.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-24 17:44:17 +00:00

1 2 3 4 5 ...

15344 Commits