biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Nick Hill	bd877162eb	[BugFix] Support online dense model DP without overhead (#30739 ) Signed-off-by: Nick Hill <nhill@redhat.com> Signed-off-by: njhill <nickhill123@gmail.com>	2026-01-02 23:36:38 +08:00
Amir Samani	030fc44914	use the same stream for cuda graph catpure and replay for NCCL (#29207 ) Signed-off-by: Amir Samani <asamani@nvidia.com> Signed-off-by: youkaichao <youkaichao@gmail.com> Co-authored-by: youkaichao <youkaichao@gmail.com>	2025-12-25 19:10:03 +08:00
vllmellm	f32cfd7d97	[ROCm][FEAT] Support AITER RMSNorm quantization fusion pass (#26575 ) Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com> Co-authored-by: TJian <tunjian.tan@embeddedllm.com>	2025-12-23 02:07:54 -08:00
Lucas Kabela	0db5439ded	[Bugfix][torch2.10] Fix test_qwen2_5_vl_compilation with 2.10 RC (#30822 ) Signed-off-by: Lucas Kabela <lucaskabela@meta.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2025-12-18 08:23:31 -08:00
Zhengxu Chen	53cd7f868b	[compile] Recompile graph module during Dynamo cache loading. (#30743 ) Signed-off-by: Zhengxu Chen <zhxchen17@fb.com>	2025-12-17 02:00:12 -08:00
Zhengxu Chen	177c391db2	[compile] Disable aot when eager backend is used. (#30810 ) Signed-off-by: zhxchen17 <zhxchen17@fb.com>	2025-12-17 01:55:56 -08:00
ElizaWszola	994acec0cc	[Bugfix] Fix fusion for VL models (#30244 ) Signed-off-by: ElizaWszola <ewszola@redhat.com>	2025-12-14 21:22:37 +08:00
Ilya Markov	3224ea9915	[torch.compile] Add encoder tag for compilation (#30489 ) Signed-off-by: ilmarkov <markovilya197@gmail.com>	2025-12-14 18:15:11 +08:00
Laith Sakka	f569c654e1	enable unbacked with aot_compile (#30462 ) Signed-off-by: Laith Sakka <lsakka@meta.com>	2025-12-14 08:14:06 +00:00
Laith Sakka	763963aa73	set assume_32bit_indexing and pass unbacked hints (#30459 ) Signed-off-by: Laith Sakka <lsakka@meta.com>	2025-12-13 15:36:53 +00:00
Zhengxu Chen	fe1787107e	[compile] Parse compile range cache keys as Range during cache loading. (#30516 ) Signed-off-by: zhxchen17 <zhxchen17@fb.com>	2025-12-12 04:30:51 +00:00
Zhengxu Chen	92fea56fd1	[compile] Stop one-off setting enable_aot_compile and use context manager instead. (#30503 ) Signed-off-by: zhxchen17 <zhxchen17@fb.com>	2025-12-11 20:28:03 +00:00
Charlie Fu	3c680f4a17	[Rocm][torch.compile] Adding layernorm + fp8 block quant and silu + fp8 block quant for Aiter (#25693 ) Signed-off-by: charlifu <charlifu@amd.com> Signed-off-by: Micah Williamson <micah.williamson@amd.com> Signed-off-by: Charlie Fu <Charlie.Fu@amd.com> Co-authored-by: Micah Williamson <micah.williamson@amd.com> Co-authored-by: wuhuikx <hattie.wu@amd.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by: Gregory Shtrasberg <156009573+gshtras@users.noreply.github.com>	2025-12-09 22:39:26 +00:00
Ilya Markov	0b6a8a304c	[BugFix] Fix non detected failing tests (#30277 ) Signed-off-by: ilmarkov <markovilya197@gmail.com>	2025-12-09 17:57:55 +00:00
Laith Sakka	87aee9ed2b	Add evaluate_guards option to DynamicShapesConfig (#27432 ) Signed-off-by: Laith Sakka <lsakka@meta.com>	2025-12-08 10:46:15 -05:00
Ye (Charlotte) Qi	eb1051fb95	[ROCm] Guard group quant RMS norm fusion patterns (#30239 )	2025-12-08 14:44:48 +00:00
Jiangyun Zhu	d143271234	[Bugfix] fix fuse_allreduce_rms when tp =1 (#30178 ) Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>	2025-12-08 06:43:47 +00:00
ElizaWszola	af0444bf40	[Performance] Fused blockwise quant RMS norm (#27883 ) Signed-off-by: ElizaWszola <ewszola@redhat.com> Signed-off-by: yewentao256 <zhyanwentao@126.com> Co-authored-by: yewentao256 <zhyanwentao@126.com>	2025-12-07 16:38:04 +00:00
Ilya Markov	4e26d3b09e	[Compile] Conditional compilation. Introduce compile_ranges (#24252 ) Signed-off-by: Luka Govedič <lgovedic@redhat.com> Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Signed-off-by: ilmarkov <markovilya197@gmail.com> Signed-off-by: Luka Govedič <luka.govedic@gmail.com> Signed-off-by: ProExpertProg <lgovedic@redhat.com> Co-authored-by: Luka Govedič <lgovedic@redhat.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com> Co-authored-by: Luka Govedič <luka.govedic@gmail.com>	2025-12-05 18:17:32 +00:00
Angela Yi	e7296b08da	[bugfix] Pass globals to aot_compiled function (#29428 ) Signed-off-by: angelayi <yiangela7@gmail.com>	2025-12-05 16:54:26 +00:00
Max Hu	c2894d3883	[Feature] Add Layer-wise NVTX Support (#29990 ) Signed-off-by: Max Hu <hyoung2991@gmail.com> Signed-off-by: Max Hu <maxhu@nvidia.com> Co-authored-by: Max Hu <maxhu@nvidia.com>	2025-12-05 11:20:07 +00:00
Laith Sakka	5867819eaf	Do not guard during noop elimination pass (#30095 ) Signed-off-by: Laith Sakka <lsakka@meta.com>	2025-12-05 04:10:12 +00:00
Laith Sakka	1f0d184590	[aot_compile]change VLLM backend to read fake args from example_value (#29104 ) Signed-off-by: Laith Sakka <lsakka@meta.com>	2025-12-04 17:33:45 -05:00
elvischenv	afe9eb408e	[Bugfix] Fix flashinfer ar+norm kernel not available issue (#29960 ) Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>	2025-12-03 18:50:53 +00:00
Yong Hoon Shin	69520bc695	Add logging for cudagraph related info (#29825 ) Signed-off-by: Yong Hoon Shin <yhshin@meta.com>	2025-12-03 01:01:48 -08:00
elvischenv	c719c40540	[Bugfix] Defunctionalize TRTLLM AR+Norm op for avoiding extra clone kernel before it (#29631 ) Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com> Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2025-12-03 05:15:50 +00:00
Arpit Khandelwal	d7284a2604	[Core] Rename PassConfig flags as per RFC #27995 (#29646 ) Signed-off-by: arpitkh101 <arpit5khandelwal@gmail.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2025-12-03 03:38:55 +00:00
Didier Durand	fae6943068	[Doc]: fixing typos in multiple files. (#29685 ) Signed-off-by: Didier Durand <durand.didier@gmail.com>	2025-11-28 08:41:41 -08:00
Matthew Bonanni	430dd4d9eb	[Attention] Remove imports from `vllm/attention/__init__.py` (#29342 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2025-11-26 10:53:15 -07:00
George D. Torres	56531b79cc	[Misc] Add backup hash algorithm for FIPS constrained environments (#28795 ) Signed-off-by: George D. Torres <gdavtor@gmail.com> Signed-off-by: George D. Torres <41129492+geodavic@users.noreply.github.com> Signed-off-by: Russell Bryant <rbryant@redhat.com> Co-authored-by: Russell Bryant <rbryant@redhat.com>	2025-11-26 00:50:22 +00:00
Ilya Markov	e7d776273d	[Compile] Refactor. Move PostGradPassManager out of Compilation config (#29340 ) Signed-off-by: ilmarkov <markovilya197@gmail.com>	2025-11-25 19:58:56 +00:00
Icey	888152bf87	Allow oot custom compiler extension via CompilerInterface (#28623 ) Signed-off-by: wxsIcey <1790571317@qq.com> Signed-off-by: Mengqing Cao <cmq0113@163.com> Signed-off-by: Icey <1790571317@qq.com> Co-authored-by: Mengqing Cao <cmq0113@163.com>	2025-11-25 15:25:15 +08:00
Laith Sakka	7a228b5305	Add option to use unbacked, and backed size obl dynamic shapes for more sounds compilation. (#26199 ) Signed-off-by: Laith Sakka <lsakka@meta.com>	2025-11-24 10:12:41 -05:00
Yanan Cao	933f67ecd8	[Bugfix]Fix a conditional to not check zero value (#28754 ) Signed-off-by: Yanan Cao <gmagogsfm@gmail.com>	2025-11-21 19:59:07 -08:00
zhrrr	a982f5b5ea	[kernel][perf] support uncontiguous input for rms_norm kernel (#28103 ) Signed-off-by: zhuhaoran <zhuhaoran.zhr@alibaba-inc.com> Signed-off-by: izhuhaoran <izhuhaoran@qq.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>	2025-11-20 19:39:09 -08:00
Driss Guessous	3fd74189db	Fixes bench (#29058 ) Signed-off-by: drisspg <drisspguessous@gmail.com>	2025-11-20 21:21:54 +00:00
vnadathur	1ffe934c8a	[torch.compile] caching of config fields should be opt-out by default (#26468 ) Signed-off-by: vnadathur <glvikramn@gmail.com> Signed-off-by: WorldExplored <srreyansh.sethi@gmail.com> Signed-off-by: Srreyansh Sethi <srreyansh.sethi@gmail.com> Signed-off-by: Srreyansh Sethi <107075589+WorldExplored@users.noreply.github.com> Co-authored-by: WorldExplored <srreyansh.sethi@gmail.com> Co-authored-by: Srreyansh Sethi <107075589+worldexplored@users.noreply.github.com> Co-authored-by: vnadathur <236933696+vnadathur@users.noreply.github.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2025-11-19 06:13:54 -08:00
Kunshang Ji	1b82fb0ad3	[XPU] work around for sp, avoid custom op import error (#28822 ) Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>	2025-11-17 13:16:44 +00:00
Didier Durand	2bb4435cb7	[Doc]: fix typos in various files (#28567 ) Signed-off-by: Didier Durand <durand.didier@gmail.com>	2025-11-15 19:27:50 +00:00
Angela Yi	f36292dbee	[compile] Enable sequence parallelism matching w/o custom ops enabled (#27126 ) Signed-off-by: angelayi <yiangela7@gmail.com> Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Signed-off-by: ProExpertProg <lgovedic@redhat.com> Co-authored-by: Luka Govedič <lgovedic@redhat.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by: Luka Govedič <luka.govedic@gmail.com>	2025-11-15 11:46:12 +00:00
Laith Sakka	2e0ad629b0	Avoid bytecode hook and simplify TorchCompileWrapperWithCustomDipatch (#25110 ) Signed-off-by: Laith Sakka <lsakka@meta.com>	2025-11-14 14:11:10 -08:00
Yanan Cao	262d263f6c	[Bugfix] Eliminate tuple inputs to submodules in graph partitioning (#28533 ) Signed-off-by: Yanan Cao <gmagogsfm@gmail.com>	2025-11-13 15:09:05 -05:00
Yanan Cao	48c879369f	[Frontend] Change CompilationMode to a proper Enum (#28165 ) Signed-off-by: Yanan Cao <gmagogsfm@gmail.com>	2025-11-11 19:46:18 -05:00
zhrrr	68c09efc37	[Kernel][Perf] fuse QK Norm and RoPE into one cuda kernel for Qwen Model (#27165 ) Signed-off-by: zhuhaoran <zhuhaoran.zhr@alibaba-inc.com>	2025-11-11 12:00:31 -05:00
Ilya Markov	d17ecc6b19	[PERF] Allreduce fusion. Support torch native matching. Tuning of the thresholds (#24248 ) Signed-off-by: Luka Govedič <lgovedic@redhat.com> Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Signed-off-by: ilmarkov <markovilya197@gmail.com> Co-authored-by: Luka Govedič <lgovedic@redhat.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>	2025-11-10 18:33:11 -05:00
Boyuan Feng	b158df2813	remove resolve_op_overloads and use splitting_ops directly (#28081 ) Signed-off-by: Boyuan Feng <boyuan@meta.com>	2025-11-08 01:13:13 +00:00
gmagogsfm	002b07c4b2	[Bugfix] vLLM should check Inductor config for compile cache enablement status (#27637 ) Signed-off-by: Yanan Cao <gmagogsfm@gmail.com>	2025-11-05 12:22:44 -05:00
Boyuan Feng	6ab183813c	[Graph Partition][Cache] Use inductor partition ops config (#27702 ) Signed-off-by: Boyuan Feng <boyuan@meta.com>	2025-11-05 13:04:48 +00:00
ahao-anyscale	cac4c10ef0	[BUG] Make 'binary' default option for saving torch compile artifacts when using standalone_compile (#27616 ) Signed-off-by: ahao-anyscale <ahao@anyscale.com>	2025-11-03 11:13:51 -05:00
Lucas Kabela	94666612a9	[Misc][qwen2_5_vl][torch.compile] Enable `supports_torch_compile` on generic nn.Module and demonstrate speedup on Qwen Vision model (#23207 ) Signed-off-by: Lucas Kabela <lucaskabela@meta.com> Signed-off-by: Lucas Kabela <lucasakabela@gmail.com>	2025-10-28 22:36:43 +00:00

1 2 3 4 5

230 Commits