Nick Hill
|
bd877162eb
|
[BugFix] Support online dense model DP without overhead (#30739)
Signed-off-by: Nick Hill <nhill@redhat.com>
Signed-off-by: njhill <nickhill123@gmail.com>
|
2026-01-02 23:36:38 +08:00 |
|
Amir Samani
|
030fc44914
|
use the same stream for cuda graph catpure and replay for NCCL (#29207)
Signed-off-by: Amir Samani <asamani@nvidia.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
|
2025-12-25 19:10:03 +08:00 |
|
vllmellm
|
f32cfd7d97
|
[ROCm][FEAT] Support AITER RMSNorm quantization fusion pass (#26575)
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
Co-authored-by: TJian <tunjian.tan@embeddedllm.com>
|
2025-12-23 02:07:54 -08:00 |
|
Lucas Kabela
|
0db5439ded
|
[Bugfix][torch2.10] Fix test_qwen2_5_vl_compilation with 2.10 RC (#30822)
Signed-off-by: Lucas Kabela <lucaskabela@meta.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2025-12-18 08:23:31 -08:00 |
|
Zhengxu Chen
|
53cd7f868b
|
[compile] Recompile graph module during Dynamo cache loading. (#30743)
Signed-off-by: Zhengxu Chen <zhxchen17@fb.com>
|
2025-12-17 02:00:12 -08:00 |
|
Zhengxu Chen
|
177c391db2
|
[compile] Disable aot when eager backend is used. (#30810)
Signed-off-by: zhxchen17 <zhxchen17@fb.com>
|
2025-12-17 01:55:56 -08:00 |
|
ElizaWszola
|
994acec0cc
|
[Bugfix] Fix fusion for VL models (#30244)
Signed-off-by: ElizaWszola <ewszola@redhat.com>
|
2025-12-14 21:22:37 +08:00 |
|
Ilya Markov
|
3224ea9915
|
[torch.compile] Add encoder tag for compilation (#30489)
Signed-off-by: ilmarkov <markovilya197@gmail.com>
|
2025-12-14 18:15:11 +08:00 |
|
Laith Sakka
|
f569c654e1
|
enable unbacked with aot_compile (#30462)
Signed-off-by: Laith Sakka <lsakka@meta.com>
|
2025-12-14 08:14:06 +00:00 |
|
Laith Sakka
|
763963aa73
|
set assume_32bit_indexing and pass unbacked hints (#30459)
Signed-off-by: Laith Sakka <lsakka@meta.com>
|
2025-12-13 15:36:53 +00:00 |
|
Zhengxu Chen
|
fe1787107e
|
[compile] Parse compile range cache keys as Range during cache loading. (#30516)
Signed-off-by: zhxchen17 <zhxchen17@fb.com>
|
2025-12-12 04:30:51 +00:00 |
|
Zhengxu Chen
|
92fea56fd1
|
[compile] Stop one-off setting enable_aot_compile and use context manager instead. (#30503)
Signed-off-by: zhxchen17 <zhxchen17@fb.com>
|
2025-12-11 20:28:03 +00:00 |
|
Charlie Fu
|
3c680f4a17
|
[Rocm][torch.compile] Adding layernorm + fp8 block quant and silu + fp8 block quant for Aiter (#25693)
Signed-off-by: charlifu <charlifu@amd.com>
Signed-off-by: Micah Williamson <micah.williamson@amd.com>
Signed-off-by: Charlie Fu <Charlie.Fu@amd.com>
Co-authored-by: Micah Williamson <micah.williamson@amd.com>
Co-authored-by: wuhuikx <hattie.wu@amd.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Gregory Shtrasberg <156009573+gshtras@users.noreply.github.com>
|
2025-12-09 22:39:26 +00:00 |
|
Ilya Markov
|
0b6a8a304c
|
[BugFix] Fix non detected failing tests (#30277)
Signed-off-by: ilmarkov <markovilya197@gmail.com>
|
2025-12-09 17:57:55 +00:00 |
|
Laith Sakka
|
87aee9ed2b
|
Add evaluate_guards option to DynamicShapesConfig (#27432)
Signed-off-by: Laith Sakka <lsakka@meta.com>
|
2025-12-08 10:46:15 -05:00 |
|
Ye (Charlotte) Qi
|
eb1051fb95
|
[ROCm] Guard group quant RMS norm fusion patterns (#30239)
|
2025-12-08 14:44:48 +00:00 |
|
Jiangyun Zhu
|
d143271234
|
[Bugfix] fix fuse_allreduce_rms when tp =1 (#30178)
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
|
2025-12-08 06:43:47 +00:00 |
|
ElizaWszola
|
af0444bf40
|
[Performance] Fused blockwise quant RMS norm (#27883)
Signed-off-by: ElizaWszola <ewszola@redhat.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Co-authored-by: yewentao256 <zhyanwentao@126.com>
|
2025-12-07 16:38:04 +00:00 |
|
Ilya Markov
|
4e26d3b09e
|
[Compile] Conditional compilation. Introduce compile_ranges (#24252)
Signed-off-by: Luka Govedič <lgovedic@redhat.com>
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Signed-off-by: ilmarkov <markovilya197@gmail.com>
Signed-off-by: Luka Govedič <luka.govedic@gmail.com>
Signed-off-by: ProExpertProg <lgovedic@redhat.com>
Co-authored-by: Luka Govedič <lgovedic@redhat.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
Co-authored-by: Luka Govedič <luka.govedic@gmail.com>
|
2025-12-05 18:17:32 +00:00 |
|
Angela Yi
|
e7296b08da
|
[bugfix] Pass globals to aot_compiled function (#29428)
Signed-off-by: angelayi <yiangela7@gmail.com>
|
2025-12-05 16:54:26 +00:00 |
|
Max Hu
|
c2894d3883
|
[Feature] Add Layer-wise NVTX Support (#29990)
Signed-off-by: Max Hu <hyoung2991@gmail.com>
Signed-off-by: Max Hu <maxhu@nvidia.com>
Co-authored-by: Max Hu <maxhu@nvidia.com>
|
2025-12-05 11:20:07 +00:00 |
|
Laith Sakka
|
5867819eaf
|
Do not guard during noop elimination pass (#30095)
Signed-off-by: Laith Sakka <lsakka@meta.com>
|
2025-12-05 04:10:12 +00:00 |
|
Laith Sakka
|
1f0d184590
|
[aot_compile]change VLLM backend to read fake args from example_value (#29104)
Signed-off-by: Laith Sakka <lsakka@meta.com>
|
2025-12-04 17:33:45 -05:00 |
|
elvischenv
|
afe9eb408e
|
[Bugfix] Fix flashinfer ar+norm kernel not available issue (#29960)
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
|
2025-12-03 18:50:53 +00:00 |
|
Yong Hoon Shin
|
69520bc695
|
Add logging for cudagraph related info (#29825)
Signed-off-by: Yong Hoon Shin <yhshin@meta.com>
|
2025-12-03 01:01:48 -08:00 |
|
elvischenv
|
c719c40540
|
[Bugfix] Defunctionalize TRTLLM AR+Norm op for avoiding extra clone kernel before it (#29631)
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
|
2025-12-03 05:15:50 +00:00 |
|
Arpit Khandelwal
|
d7284a2604
|
[Core] Rename PassConfig flags as per RFC #27995 (#29646)
Signed-off-by: arpitkh101 <arpit5khandelwal@gmail.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
|
2025-12-03 03:38:55 +00:00 |
|
Didier Durand
|
fae6943068
|
[Doc]: fixing typos in multiple files. (#29685)
Signed-off-by: Didier Durand <durand.didier@gmail.com>
|
2025-11-28 08:41:41 -08:00 |
|
Matthew Bonanni
|
430dd4d9eb
|
[Attention] Remove imports from vllm/attention/__init__.py (#29342)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
|
2025-11-26 10:53:15 -07:00 |
|
George D. Torres
|
56531b79cc
|
[Misc] Add backup hash algorithm for FIPS constrained environments (#28795)
Signed-off-by: George D. Torres <gdavtor@gmail.com>
Signed-off-by: George D. Torres <41129492+geodavic@users.noreply.github.com>
Signed-off-by: Russell Bryant <rbryant@redhat.com>
Co-authored-by: Russell Bryant <rbryant@redhat.com>
|
2025-11-26 00:50:22 +00:00 |
|
Ilya Markov
|
e7d776273d
|
[Compile] Refactor. Move PostGradPassManager out of Compilation config (#29340)
Signed-off-by: ilmarkov <markovilya197@gmail.com>
|
2025-11-25 19:58:56 +00:00 |
|
Icey
|
888152bf87
|
Allow oot custom compiler extension via CompilerInterface (#28623)
Signed-off-by: wxsIcey <1790571317@qq.com>
Signed-off-by: Mengqing Cao <cmq0113@163.com>
Signed-off-by: Icey <1790571317@qq.com>
Co-authored-by: Mengqing Cao <cmq0113@163.com>
|
2025-11-25 15:25:15 +08:00 |
|
Laith Sakka
|
7a228b5305
|
Add option to use unbacked, and backed size obl dynamic shapes for more sounds compilation. (#26199)
Signed-off-by: Laith Sakka <lsakka@meta.com>
|
2025-11-24 10:12:41 -05:00 |
|
Yanan Cao
|
933f67ecd8
|
[Bugfix]Fix a conditional to not check zero value (#28754)
Signed-off-by: Yanan Cao <gmagogsfm@gmail.com>
|
2025-11-21 19:59:07 -08:00 |
|
zhrrr
|
a982f5b5ea
|
[kernel][perf] support uncontiguous input for rms_norm kernel (#28103)
Signed-off-by: zhuhaoran <zhuhaoran.zhr@alibaba-inc.com>
Signed-off-by: izhuhaoran <izhuhaoran@qq.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
|
2025-11-20 19:39:09 -08:00 |
|
Driss Guessous
|
3fd74189db
|
Fixes bench (#29058)
Signed-off-by: drisspg <drisspguessous@gmail.com>
|
2025-11-20 21:21:54 +00:00 |
|
vnadathur
|
1ffe934c8a
|
[torch.compile] caching of config fields should be opt-out by default (#26468)
Signed-off-by: vnadathur <glvikramn@gmail.com>
Signed-off-by: WorldExplored <srreyansh.sethi@gmail.com>
Signed-off-by: Srreyansh Sethi <srreyansh.sethi@gmail.com>
Signed-off-by: Srreyansh Sethi <107075589+WorldExplored@users.noreply.github.com>
Co-authored-by: WorldExplored <srreyansh.sethi@gmail.com>
Co-authored-by: Srreyansh Sethi <107075589+worldexplored@users.noreply.github.com>
Co-authored-by: vnadathur <236933696+vnadathur@users.noreply.github.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
|
2025-11-19 06:13:54 -08:00 |
|
Kunshang Ji
|
1b82fb0ad3
|
[XPU] work around for sp, avoid custom op import error (#28822)
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
|
2025-11-17 13:16:44 +00:00 |
|
Didier Durand
|
2bb4435cb7
|
[Doc]: fix typos in various files (#28567)
Signed-off-by: Didier Durand <durand.didier@gmail.com>
|
2025-11-15 19:27:50 +00:00 |
|
Angela Yi
|
f36292dbee
|
[compile] Enable sequence parallelism matching w/o custom ops enabled (#27126)
Signed-off-by: angelayi <yiangela7@gmail.com>
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Signed-off-by: ProExpertProg <lgovedic@redhat.com>
Co-authored-by: Luka Govedič <lgovedic@redhat.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Luka Govedič <luka.govedic@gmail.com>
|
2025-11-15 11:46:12 +00:00 |
|
Laith Sakka
|
2e0ad629b0
|
Avoid bytecode hook and simplify TorchCompileWrapperWithCustomDipatch (#25110)
Signed-off-by: Laith Sakka <lsakka@meta.com>
|
2025-11-14 14:11:10 -08:00 |
|
Yanan Cao
|
262d263f6c
|
[Bugfix] Eliminate tuple inputs to submodules in graph partitioning (#28533)
Signed-off-by: Yanan Cao <gmagogsfm@gmail.com>
|
2025-11-13 15:09:05 -05:00 |
|
Yanan Cao
|
48c879369f
|
[Frontend] Change CompilationMode to a proper Enum (#28165)
Signed-off-by: Yanan Cao <gmagogsfm@gmail.com>
|
2025-11-11 19:46:18 -05:00 |
|
zhrrr
|
68c09efc37
|
[Kernel][Perf] fuse QK Norm and RoPE into one cuda kernel for Qwen Model (#27165)
Signed-off-by: zhuhaoran <zhuhaoran.zhr@alibaba-inc.com>
|
2025-11-11 12:00:31 -05:00 |
|
Ilya Markov
|
d17ecc6b19
|
[PERF] Allreduce fusion. Support torch native matching. Tuning of the thresholds (#24248)
Signed-off-by: Luka Govedič <lgovedic@redhat.com>
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Signed-off-by: ilmarkov <markovilya197@gmail.com>
Co-authored-by: Luka Govedič <lgovedic@redhat.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
|
2025-11-10 18:33:11 -05:00 |
|
Boyuan Feng
|
b158df2813
|
remove resolve_op_overloads and use splitting_ops directly (#28081)
Signed-off-by: Boyuan Feng <boyuan@meta.com>
|
2025-11-08 01:13:13 +00:00 |
|
gmagogsfm
|
002b07c4b2
|
[Bugfix] vLLM should check Inductor config for compile cache enablement status (#27637)
Signed-off-by: Yanan Cao <gmagogsfm@gmail.com>
|
2025-11-05 12:22:44 -05:00 |
|
Boyuan Feng
|
6ab183813c
|
[Graph Partition][Cache] Use inductor partition ops config (#27702)
Signed-off-by: Boyuan Feng <boyuan@meta.com>
|
2025-11-05 13:04:48 +00:00 |
|
ahao-anyscale
|
cac4c10ef0
|
[BUG] Make 'binary' default option for saving torch compile artifacts when using standalone_compile (#27616)
Signed-off-by: ahao-anyscale <ahao@anyscale.com>
|
2025-11-03 11:13:51 -05:00 |
|
Lucas Kabela
|
94666612a9
|
[Misc][qwen2_5_vl][torch.compile] Enable supports_torch_compile on generic nn.Module and demonstrate speedup on Qwen Vision model (#23207)
Signed-off-by: Lucas Kabela <lucaskabela@meta.com>
Signed-off-by: Lucas Kabela <lucasakabela@gmail.com>
|
2025-10-28 22:36:43 +00:00 |
|