biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
gxd3	a0dd1995c7	[Hardware][TPU] Add supports_async_scheduling() method to Executor interface so that it can be extended for Executor implementations. (#36924 ) Signed-off-by: Guangxiang Du <gxd@google.com>	2026-03-18 12:53:28 +08:00
Hari	a3e2e250f0	[Feature] Add Azure Blob Storage support for RunAI Model Streamer (#34614 ) Signed-off-by: hasethuraman <hsethuraman@microsoft.com>	2026-03-15 19:38:21 +08:00
Mark McLoughlin	7afe0faab1	[Frontend][Core] Re-add shutdown timeout - allowing in-flight requests to finish (#36666 ) Signed-off-by: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Nick Hill <nickhill123@gmail.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com> Co-authored-by: Nick Hill <nickhill123@gmail.com>	2026-03-13 12:10:06 -07:00
Mark McLoughlin	234860399b	[Frontend][Core] Revert "Add shutdown timeout" (#34730 and #36270 ) (#36628 ) Signed-off-by: Mark McLoughlin <markmc@redhat.com>	2026-03-10 06:20:41 -07:00
Copilot	4b87ffbefb	[torch.compile] Rename `compile_ranges_split_points` to `compile_ranges_endpoints` (#36027 ) Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: ProExpertProg <11367180+ProExpertProg@users.noreply.github.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2026-03-09 18:04:40 +00:00
Matthew Bonanni	77a73458e3	Reapply [Attention] Refactor `check_and_update_config` (#35122 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2026-03-09 07:17:14 -07:00
Tushar Shetty	c4d859c274	[Bugfix] Skip out-of-stage layers in get_layers_from_vllm_config for pipeline parallel (#36243 ) Signed-off-by: Tushar Shetty <tushar.shetty@abbyy.com> Signed-off-by: Tushar Shetty <54362365+tusharshetty61@users.noreply.github.com>	2026-03-08 20:40:16 -07:00
PatchyTIS	a6be75dbd2	[Core] NGram GPU Implementation compatible with Async Scheduler (#29184 )	2026-03-07 13:51:37 -08:00
lif	00b814ba5a	[V0 Deprecation] Remove unused swap_space parameter (#36216 ) Signed-off-by: majiayu000 <1835304752@qq.com> Co-authored-by: mcelrath	2026-03-07 22:09:55 +08:00
Copilot	ce8546a12b	[docs][torch.compile] Add fusions.md — kernel/operator fusion reference page (#35538 ) Signed-off-by: ProExpertProg <luka.govedic@gmail.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: ProExpertProg <11367180+ProExpertProg@users.noreply.github.com> Co-authored-by: ProExpertProg <luka.govedic@gmail.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2026-03-06 23:55:06 +00:00
Mark McLoughlin	27066d1b2b	[Frontend][Core] Add shutdown timeout - allowing in-flight requests to finish (#34730 ) Signed-off-by: Mark McLoughlin <markmc@redhat.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>	2026-03-05 22:04:31 -08:00
Jiayi Yan	6a895197fa	[Bugfix][CI] fix typos (#34934 ) Signed-off-by: 1195343015 <1195343015@qq.com> Signed-off-by: Jiayi Yan <66017932+1195343015@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-05 17:05:46 +00:00
sungsoo ha	6cb901093f	[Core] Add All-to-All communication backend for DCP (#34883 ) Signed-off-by: Sungsoo Ha <sungsooh@nvidia.com> Signed-off-by: sungsoo ha <hasungsoo@gmail.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-04 10:01:57 -05:00
Yashwant Bezawada	a13d8c03c9	[KVConnector] Auto-downgrade to PIECEWISE cudagraph mode for layerwise async ops (#31057 ) Signed-off-by: Yashwant Bezawada <yashwant_b@me.com>	2026-03-02 15:04:47 -05:00
Richard Zou	e82fbeec7b	[torch.compile] Undo the fast_moe_cold_start hack in torch>=2.11 (#35475 ) Signed-off-by: Richard Zou <zou3519@gmail.com>	2026-03-01 21:44:22 +00:00
Jason Li	66c1751d13	[compile] Cleanup: Remove unnecessary +rms_norm forcing for sequence parallelism (#35410 ) Signed-off-by: jasonlizhengjian <jasonlizhengjian@gmail.com>	2026-02-27 08:36:37 -05:00
Jiangyun Zhu	487e5c51f7	[Bugfix] disable allreduce_rms_fusion by default when pp size > 1 (#35424 ) Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>	2026-02-27 04:18:52 +00:00
Gregory Shtrasberg	6042e66cd5	[ROCm] Add extra step in config initialization to populate custom ops before compilation config init (#34848 ) Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>	2026-02-26 16:05:40 +08:00
Jason Li	9d37941017	[torch.compile] Sequence Parallelism threshold compile ranges (#28672 ) Signed-off-by: jasonlizhengjian <jasonlizhengjian@gmail.com> Signed-off-by: Jason Li <jasonlizhengjian@gmail.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2026-02-26 05:00:12 +00:00
Michael Goin	cbf8f7028c	[UX] Add `--performance-mode {balanced,interactivity,throughput}` (#34936 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2026-02-25 17:28:31 -08:00
Ming Yang	6831650c40	[offloader] v2: Hide weight onloading latency via prefetching (#29941 ) Signed-off-by: Ming Yang <minos.future@gmail.com> Signed-off-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>	2026-02-25 17:20:59 -08:00
Rohan Potdar	f38f8c9742	[ROCm]: Enable customop and rope+kvcache fusion for AITER RoPE (#35180 ) Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>	2026-02-25 04:36:40 +00:00
Benjamin Chislett	f5972a872f	[Model][Spec Decode] Nemotron-H MTP and Mamba Speculative Decoding Support (#33726 ) Signed-off-by: Shahar Mor <smor@nvidia.com> Signed-off-by: Benjamin Chislett <bchislett@nvidia.com> Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Co-authored-by: Shahar Mor <smor@nvidia.com> Co-authored-by: Roi Koren <roik@nvidia.com> Co-authored-by: Lucas Wilkinson <lwilkins@redhat.com>	2026-02-24 09:49:56 -08:00
Rohan Potdar	2ff4e51152	[ROCm] AITER fused RoPE+KVCache (#33443 ) Signed-off-by: Rohan138 <rohanpotdar138@gmail.com> Signed-off-by: charlifu <charlifu@amd.com> Signed-off-by: Rohan Potdar <66227218+Rohan138@users.noreply.github.com> Co-authored-by: charlifu <charlifu@amd.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by: Douglas Lehr <91553416+dllehr-amd@users.noreply.github.com>	2026-02-23 19:06:00 -08:00
Benjamin Chislett	682566b18e	[Bug] Refactor max_num_batched_tokens to account for drafting (#34898 ) Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>	2026-02-22 11:18:46 -05:00
Lucas Wilkinson	aaefc58ee0	[CI] Revert PRs 34818 and 33600 (#34979 )	2026-02-20 13:25:50 -08:00
Matthew Bonanni	662205d34e	[Bugfix] Fix Basic Models Test (#34818 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>	2026-02-19 14:49:07 -08:00
Luka Govedič	02e8f26cea	[torch.compile] Turn on silu+fp4 quant fusion by default for O1+ (#34718 ) Signed-off-by: Luka Govedič <lgovedic@redhat.com>	2026-02-18 03:29:15 +00:00
Luka Govedič	23d825aba1	[torch.compile] Disable ar-rms fusion for ds3-fp4 & DP, fix CI test (#34392 ) Signed-off-by: Luka Govedič <lgovedic@redhat.com> Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2026-02-15 06:33:57 -08:00
Thomas Parnell	d5fe3f702c	[Hybrid] Enable mamba prefix cache "align" mode with async scheduling (#33997 ) Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>	2026-02-14 13:15:56 -08:00
Richard Zou	87789c8364	[Misc] vLLM's --enforce-eager should turn off compile and cudagraphs only (#34523 ) Signed-off-by: Richard Zou <zou3519@gmail.com>	2026-02-13 09:52:20 -08:00
Harry Huang	6f019e6e0a	[BugFix] Add block_size validation for mamba cache align mode (#34445 ) Signed-off-by: huanghaoyan.hhy <huanghaoyan.hhy@alibaba-inc.com>	2026-02-12 18:18:07 -08:00
SorenDreano	48134a2c22	[Docs] Fix typo ("defult") and double spacing (#34348 ) Signed-off-by: SorenDreano <71752785+SorenDreano@users.noreply.github.com> Co-authored-by: Soren Dreano <soren@numind.ai> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2026-02-11 09:02:27 -08:00
Luka Govedič	addac0e653	[torch.compile] Enable AR+rms fusion by default available for `-O2` (#34299 ) Signed-off-by: Luka Govedič <lgovedic@redhat.com>	2026-02-11 00:30:00 -08:00
Richard Zou	4df841fe75	[torch.compile] Add an option to force-enable the MOE cold start optimization (#33735 ) Signed-off-by: Richard Zou <zou3519@gmail.com>	2026-02-08 18:42:56 +00:00
Mohammad Miadh Angkad	dd6a6e1190	[Kernel] Add KernelConfig flag to enable/disable FlashInfer autotune (#34006 ) Signed-off-by: Mohammad Miadh Angkad <176301910+mmangkad@users.noreply.github.com> Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2026-02-07 05:24:44 -08:00
rasmith	ec28784fdc	[CI][AMD]Bugfix] Check that model_config is not None in enable_norm_pad_fusion (#34007 ) Signed-off-by: Randall Smith <Randall.Smith@amd.com>	2026-02-07 02:43:25 +00:00
Benjamin Chislett	af3162d3aa	[Spec Decode] Unified Parallel Drafting (#32887 ) Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>	2026-02-05 12:37:18 -05:00
Aaron Hao	c1858b7ec8	[Feat][RL][1/2] Native Weight Syncing API: NCCL (#31943 ) Signed-off-by: ahao-anyscale <ahao@anyscale.com> Signed-off-by: Aaron Hao <ahao@anyscale.com> Co-authored-by: SumanthRH <sumanthrh99@gmail.com>	2026-02-05 12:13:23 -05:00
Luka Govedič	4d9513537d	[CI][torch.compile] Reduce e2e fusion test time (#33293 ) Signed-off-by: Luka Govedič <lgovedic@redhat.com> Signed-off-by: ProExpertProg <luka.govedic@gmail.com> Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2026-02-04 19:09:03 -05:00
Harry Mellor	61e632aea1	Turn `@config` into a `dataclass_transform` (#31541 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-02-03 17:40:59 +00:00
Frank Wang	8f5d51203b	Disable Cascade Attention for Batch Invariance (#32561 ) Signed-off-by: frankwang28 <frank.wbb@hotmail.com> Signed-off-by: Frank Wang <41319051+frankwang28@users.noreply.github.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>	2026-01-30 10:00:46 -05:00
Harry Huang	ec51831a22	[BugFix] Disable async scheduling for Mamba prefix caching (#33352 ) Signed-off-by: huanghaoyan.hhy <huanghaoyan.hhy@alibaba-inc.com>	2026-01-30 04:40:19 +00:00
Harry Mellor	80b918f2bd	Fix `tie_word_embeddings` for multimodal models in Transformers v5 (#33359 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-01-30 03:37:39 +00:00
Lucas Wilkinson	a650ad1588	[Misc] Remove missed `pad_for_cudagraph` (#33283 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2026-01-29 09:12:05 +00:00
Rohan Potdar	59bcc5b6f2	Use aiter triton fused_add_rmsnorm_pad for gpt-oss (#30976 ) Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>	2026-01-28 20:47:47 +00:00
Hiroken.	1209b784f2	[Bugfix]: resolve torch.compile cache conflict between mm_encoder_tp_modes (#32842 ) Signed-off-by: Hongjian Zhang <zhanghongjian@xiaohongshu.com> Signed-off-by: Xingran Wang <wangxingran123456@outlook.com> Co-authored-by: Xingran Wang <wangxingran123456@outlook.com>	2026-01-24 14:45:14 +00:00
Harry Huang	5206e5e28c	[V1][Hybrid] Mamba Prefix Caching with align mode (#30877 ) Signed-off-by: huanghaoyan.hhy <huanghaoyan.hhy@alibaba-inc.com> Signed-off-by: Chen Zhang <zhangch99@outlook.com> Co-authored-by: Chen Zhang <zhangch99@outlook.com>	2026-01-23 09:56:48 -08:00
Nick Hill	9b693d023c	[Misc] Omit "disable NCCL for DP sync" startup log when not applicable (#32707 ) Signed-off-by: Nick Hill <nickhill123@gmail.com>	2026-01-21 17:03:39 +00:00
Tomas Ruiz	4a5299c93f	feat: spec decode with draft models (#24322 ) Signed-off-by: Tomas Ruiz <tomas.ruiz.te@gmail.com>	2026-01-19 16:05:46 -05:00

1 2 3

128 Commits