biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Wentao Ye	d7e93e13fb	[Feature] EPLB Support for GPU Model Runner v2 (#37488 ) Signed-off-by: yewentao256 <zhyanwentao@126.com> Signed-off-by: Woosuk Kwon <woosuk@inferact.ai> Co-authored-by: Woosuk Kwon <woosuk@inferact.ai>	2026-03-25 08:16:39 -07:00
Wentao Ye	c59a132f96	[V0 Deprecation] Refactor kv cache from list to element (#37487 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2026-03-23 20:10:11 -07:00
Andreas Karatzas	66f927f205	[Bugfix] Fix pooling non-determinism from pinned prompt_lens aliasing (#37775 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-03-22 03:22:24 +00:00
Francesco Fusco	298e510848	[Hybrid] calling get_mamba_groups() once at MambaCopyBuffers.create() (#37318 ) Signed-off-by: Francesco Fusco <ffu@zurich.ibm.com>	2026-03-21 09:29:43 +00:00
Kunshang Ji	53ec16a705	[Hardware] Replace torch.cuda.device_count/current_device/set_device API (#36145 ) Signed-off-by: Kunshang Ji <jikunshang95@gmail.com> Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>	2026-03-12 07:57:47 -07:00
Wentao Ye	c34ba6b961	[Perf] Optimize compute maxsim using batched version, 3.2% E2E throughput improvement (#36710 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2026-03-12 08:37:01 +08:00
Nicolò Lucchesi	098d844731	[NIXL][1/N] Refactor `kernel_block_size` detection (#35752 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2026-03-11 01:11:23 -07:00
Wentao Ye	7279374f91	[Perf] Compute maxsim in worker side, reducing redundant copies, 2.7% E2E throughput improvement (#36159 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2026-03-09 20:55:58 -07:00
lif	00b814ba5a	[V0 Deprecation] Remove unused swap_space parameter (#36216 ) Signed-off-by: majiayu000 <1835304752@qq.com> Co-authored-by: mcelrath	2026-03-07 22:09:55 +08:00
haosdent	d6e04f4c43	[Bugfix] Cap FULL decode cudagraph sizes for Mamba/hybrid models (#34094 ) (#34571 ) Signed-off-by: haosdent <haosdent@gmail.com> Co-authored-by: zjy0516 <riverclouds.zhu@qq.com>	2026-03-04 11:56:22 +01:00
Itay Alroy	dea268336f	[1/N] Elastic EP Milestone 2 (#34861 ) Signed-off-by: Yongji Wu <wuyongji317@gmail.com> Signed-off-by: Itay Alroy <ialroy@nvidia.com> Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com> Signed-off-by: Ron Tourgeman <rtourgeman@nvidia.com> Co-authored-by: Yongji Wu <wuyongji317@gmail.com> Co-authored-by: Tyler Michael Smith <tlrmchlsmth@gmail.com> Co-authored-by: Ron Tourgeman <rtourgeman@nvidia.com>	2026-02-28 04:46:42 +00:00
Huamin Li	157722da75	[perf] Use pinned memory for async H2D transfer in do_mamba_copy_block (#35480 ) Signed-off-by: Huamin Li <3ericli@gmail.com>	2026-02-28 01:50:37 +08:00
Nicolò Lucchesi	cabdaa7619	[Misc] Move `GPUModelRunner.prepare_kernel_block_sizes` to utils (#35400 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2026-02-27 11:42:51 +08:00
Chen Zhang	8fae54faff	[Linear Attention] fix bug for linear attention + prefix caching + reset_prefix_cache (#35157 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2026-02-24 22:00:19 -08:00
Kyle Sayers	f857a03f6b	[QeRL] Layerwise Reloading (#32133 ) Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>	2026-01-30 08:50:05 -07:00
Matthew Bonanni	a608b4c6c2	[5/N][Attention] Finish eliminating `vllm/attention` folder (#32064 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2026-01-27 10:02:51 -05:00
David Ramon Prados	3a63be0faa	Support custom URI schemes and trace handlers for profiler (#32393 )	2026-01-22 09:45:40 -08:00
Tomas Ruiz	4a5299c93f	feat: spec decode with draft models (#24322 ) Signed-off-by: Tomas Ruiz <tomas.ruiz.te@gmail.com>	2026-01-19 16:05:46 -05:00
Matthew Bonanni	2612ba9285	[1/N][Attention] Restructure attention: move files (#31916 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2026-01-09 13:10:24 -08:00
Lucas Wilkinson	6cdf015c3c	[Misc] Fix `Current vLLM config is not set.` warnings, assert to avoid issues in the future (#31747 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2026-01-08 15:20:49 -08:00
wangxiyuan	bb4337b34c	[Platform] Deprecate seed_everything (#31659 ) Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2026-01-04 18:34:04 -08:00
Jialin Ouyang	6e9dbcc50e	[Fix] uniform decode batch check (#30747 ) Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>	2025-12-17 19:58:43 +08:00
Benjamin Chislett	e858bfe051	[Cleanup] Refactor profiling env vars into a CLI config (#29912 ) Signed-off-by: Benjamin Chislett <bchislett@nvidia.com> Signed-off-by: Benjamin Chislett <chislett.ben@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-12-09 13:29:33 -05:00
Cyrus Leung	e83b7e379c	Revert "[Renderer] Separate out `RendererConfig` from `ModelConfig` (#30145 )" (#30199 )	2025-12-07 00:00:22 -08:00
Cyrus Leung	27f4c2fd46	[Renderer] Separate out `RendererConfig` from `ModelConfig` (#30145 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-12-06 23:15:42 -08:00
Matthew Bonanni	66e674cdd5	[Attention][UX][1/N] Add AttentionConfig and change attention env vars to CLI arguments (#26315 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>	2025-12-05 09:48:43 -08:00
rasmith	f2f4cea6cc	[CI/Build][AMD] Skip test on test_hybrid_attention_mamba_tensor_shapes on ROCm, requires FLASHINFER (#29995 ) Signed-off-by: Randall Smith <ransmith@amd.com> Co-authored-by: Randall Smith <ransmith@amd.com>	2025-12-04 09:30:22 +00:00
Harry Mellor	951445a52d	Remove default values from `InitVar`s so that they're not stored (#29859 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-12-02 12:16:37 +00:00
Matthew Bonanni	430dd4d9eb	[Attention] Remove imports from `vllm/attention/__init__.py` (#29342 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2025-11-26 10:53:15 -07:00
Nicolò Lucchesi	066209a045	[Attention] Refactor FA `block_size` limitations to hybrid models only (#29084 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-11-22 06:38:44 -08:00
Or Ozeri	647464719b	[KVConnector][Core] Support cross-layer KV blocks (#27743 ) Signed-off-by: Or Ozeri <oro@il.ibm.com>	2025-11-20 19:09:59 +01:00
Benjamin Chislett	fcbcba6c70	[Feat] Iteration-level profiling for Torch and CUDA profiler (#28987 ) Signed-off-by: Benjamin Chislett <bchislett@nvidia.com> Signed-off-by: Benjamin Chislett <chislett.ben@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-11-19 19:17:48 -08:00
Qiu	2fd893b4ce	[Feature] Prefill Context Parallel (PCP) basic support (#28718 ) Signed-off-by: QiuChunshuo <qiuchunshuo@huawei.com> Signed-off-by: FENP <yuanyongjie.yyj@antgroup.com> Signed-off-by: LookAround <lixushi@huawei.com> Signed-off-by: Jingchun Gao <gaojingchun1@huawei.com> Signed-off-by: zhenwenqi2024 <zhenwenqi_2022@qq.com> Co-authored-by: FENP <yuanyongjie.yyj@antgroup.com> Co-authored-by: LookAround <lixushi@huawei.com> Co-authored-by: Jingchun Gao <gaojingchun1@huawei.com> Co-authored-by: zhenwenqi2024 <zhenwenqi_2022@qq.com> Co-authored-by: Jingchun Gao <63247409+gjc0824@users.noreply.github.com>	2025-11-19 15:52:44 -05:00
Nicolò Lucchesi	96b23b8e3b	[Bugfix][Nixl] Fix kernel physical<>logical block_size issue (#28677 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-11-14 22:40:05 +08:00
Matthew Bonanni	b30dfa03c5	[Attention] Refactor CUDA attention backend selection logic (#24794 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2025-11-11 07:40:44 -05:00
zhangsicheng5	2108a571d7	[DCP] Support dcp kv_cache interleave size > 1 (#26696 ) Signed-off-by: zhangsicheng5 <zhangsicheng5@huawei.com> Signed-off-by: QiuChunshuo <qiuchunshuo@huawei.com> Signed-off-by: Qiu <qiuchunshuo@huawei.com> Co-authored-by: QiuChunshuo <qiuchunshuo@huawei.com>	2025-11-09 04:45:27 +09:00
Nick Hill	0cdbe7b744	[Core] Async scheduling + structured outputs compatibility (#26866 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-11-01 00:35:04 +00:00
Chen Zhang	df334868ca	[Hybrid] A simpler algorithm to find kernel_block_size (#26476 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-10-31 21:30:28 +00:00
Nick Hill	2ce5c5d3d6	[BugFix] Handle unscheduled requests properly when async scheduling (#27756 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-10-29 21:04:25 -07:00
Jonathan Chen	ca76486a16	[Chore] Separate out `vllm.utils.platform_utils.py` (#27374 ) Signed-off-by: Jonathan <chenleejonathan@gmail.com>	2025-10-23 19:08:06 +00:00
dongbo910220	a0003b56b0	[Chore] Separate out system utilities from vllm.utils (#27201 ) Signed-off-by: dongbo910220 <1275604947@qq.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2025-10-22 20:25:25 +00:00
Isotr0py	6ac5e06f7c	[Chore] Clean up pytorch helper functions in `vllm.utils` (#26908 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Signed-off-by: isotr0py <2037008807@qq.com>	2025-10-18 09:48:22 -07:00
iAmir97	1d165d6d85	[Chore] Separate out `vllm.utils.mem_utils` (#27143 ) Signed-off-by: iAmir97 <Amir.balwel@embeddedllm.com> Signed-off-by: iAmir97 <71513472+iAmir97@users.noreply.github.com> Co-authored-by: iAmir97 <Amir.balwel@embeddedllm.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-10-18 10:06:59 +00:00
Nick Hill	4aed506b65	[Core] Streamline some structured output related code (#26737 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-10-14 23:27:44 +00:00
Harry Mellor	8fcaaf6a16	Update `Optional[x]` -> `x \| None` and `Union[x, y]` to `x \| y` (#26633 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-10-12 09:51:31 -07:00
Zhiyuan Li	d24cf322e1	[Hybrid]: Decouple Kernel Block Size from KV Page Size (#24486 ) Signed-off-by: lizhiyuan <uniartisan2017@gmail.com> Signed-off-by: Zhiyuan Li <uniartisan2017@gmail.com>	2025-10-08 23:43:39 -07:00
Qier Li	d17f0fbf30	[Core][KVConnector] Propagate all tokens on resumed preemptions (#24926 ) Signed-off-by: Qier Li <kevin44036@gmail.com> Co-authored-by: Qier Li <qier@fb.com>	2025-10-09 14:43:31 +08:00
Sergei Skvortsov	6ebaf43ee4	[V1] Logit processors for rejection sampler (#19482 ) Signed-off-by: southfreebird <yvorott@gmail.com> Signed-off-by: Sergei Skvortsov <sergeyskv@nebius.com> Signed-off-by: Sergei Skvortsov <yvorott@gmail.com> Co-authored-by: Sergei Skvortsov <sergeyskv@nebius.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2025-10-07 13:02:49 -07:00
Harry Mellor	d6953beb91	Convert formatting to use `ruff` instead of `yapf` + `isort` (#26247 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-10-05 07:06:22 -07:00
David Ben-David	9a9f48dff7	[V1] [P/D] Add Support for KV Load Failure Recovery (#19330 ) Signed-off-by: David Ben-David <davidb@pliops.com> Co-authored-by: David Ben-David <davidb@pliops.com>	2025-09-30 14:57:08 -07:00

1 2 3

107 Commits