Jiangyun Zhu
|
d143271234
|
[Bugfix] fix fuse_allreduce_rms when tp =1 (#30178)
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
|
2025-12-08 06:43:47 +00:00 |
|
ElizaWszola
|
af0444bf40
|
[Performance] Fused blockwise quant RMS norm (#27883)
Signed-off-by: ElizaWszola <ewszola@redhat.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Co-authored-by: yewentao256 <zhyanwentao@126.com>
|
2025-12-07 16:38:04 +00:00 |
|
Ilya Markov
|
4e26d3b09e
|
[Compile] Conditional compilation. Introduce compile_ranges (#24252)
Signed-off-by: Luka Govedič <lgovedic@redhat.com>
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Signed-off-by: ilmarkov <markovilya197@gmail.com>
Signed-off-by: Luka Govedič <luka.govedic@gmail.com>
Signed-off-by: ProExpertProg <lgovedic@redhat.com>
Co-authored-by: Luka Govedič <lgovedic@redhat.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
Co-authored-by: Luka Govedič <luka.govedic@gmail.com>
|
2025-12-05 18:17:32 +00:00 |
|
Angela Yi
|
e7296b08da
|
[bugfix] Pass globals to aot_compiled function (#29428)
Signed-off-by: angelayi <yiangela7@gmail.com>
|
2025-12-05 16:54:26 +00:00 |
|
Max Hu
|
c2894d3883
|
[Feature] Add Layer-wise NVTX Support (#29990)
Signed-off-by: Max Hu <hyoung2991@gmail.com>
Signed-off-by: Max Hu <maxhu@nvidia.com>
Co-authored-by: Max Hu <maxhu@nvidia.com>
|
2025-12-05 11:20:07 +00:00 |
|
Laith Sakka
|
5867819eaf
|
Do not guard during noop elimination pass (#30095)
Signed-off-by: Laith Sakka <lsakka@meta.com>
|
2025-12-05 04:10:12 +00:00 |
|
Laith Sakka
|
1f0d184590
|
[aot_compile]change VLLM backend to read fake args from example_value (#29104)
Signed-off-by: Laith Sakka <lsakka@meta.com>
|
2025-12-04 17:33:45 -05:00 |
|
elvischenv
|
afe9eb408e
|
[Bugfix] Fix flashinfer ar+norm kernel not available issue (#29960)
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
|
2025-12-03 18:50:53 +00:00 |
|
Yong Hoon Shin
|
69520bc695
|
Add logging for cudagraph related info (#29825)
Signed-off-by: Yong Hoon Shin <yhshin@meta.com>
|
2025-12-03 01:01:48 -08:00 |
|
elvischenv
|
c719c40540
|
[Bugfix] Defunctionalize TRTLLM AR+Norm op for avoiding extra clone kernel before it (#29631)
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
|
2025-12-03 05:15:50 +00:00 |
|
Arpit Khandelwal
|
d7284a2604
|
[Core] Rename PassConfig flags as per RFC #27995 (#29646)
Signed-off-by: arpitkh101 <arpit5khandelwal@gmail.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
|
2025-12-03 03:38:55 +00:00 |
|
Didier Durand
|
fae6943068
|
[Doc]: fixing typos in multiple files. (#29685)
Signed-off-by: Didier Durand <durand.didier@gmail.com>
|
2025-11-28 08:41:41 -08:00 |
|
Matthew Bonanni
|
430dd4d9eb
|
[Attention] Remove imports from vllm/attention/__init__.py (#29342)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
|
2025-11-26 10:53:15 -07:00 |
|
George D. Torres
|
56531b79cc
|
[Misc] Add backup hash algorithm for FIPS constrained environments (#28795)
Signed-off-by: George D. Torres <gdavtor@gmail.com>
Signed-off-by: George D. Torres <41129492+geodavic@users.noreply.github.com>
Signed-off-by: Russell Bryant <rbryant@redhat.com>
Co-authored-by: Russell Bryant <rbryant@redhat.com>
|
2025-11-26 00:50:22 +00:00 |
|
Ilya Markov
|
e7d776273d
|
[Compile] Refactor. Move PostGradPassManager out of Compilation config (#29340)
Signed-off-by: ilmarkov <markovilya197@gmail.com>
|
2025-11-25 19:58:56 +00:00 |
|
Icey
|
888152bf87
|
Allow oot custom compiler extension via CompilerInterface (#28623)
Signed-off-by: wxsIcey <1790571317@qq.com>
Signed-off-by: Mengqing Cao <cmq0113@163.com>
Signed-off-by: Icey <1790571317@qq.com>
Co-authored-by: Mengqing Cao <cmq0113@163.com>
|
2025-11-25 15:25:15 +08:00 |
|
Laith Sakka
|
7a228b5305
|
Add option to use unbacked, and backed size obl dynamic shapes for more sounds compilation. (#26199)
Signed-off-by: Laith Sakka <lsakka@meta.com>
|
2025-11-24 10:12:41 -05:00 |
|
Yanan Cao
|
933f67ecd8
|
[Bugfix]Fix a conditional to not check zero value (#28754)
Signed-off-by: Yanan Cao <gmagogsfm@gmail.com>
|
2025-11-21 19:59:07 -08:00 |
|
zhrrr
|
a982f5b5ea
|
[kernel][perf] support uncontiguous input for rms_norm kernel (#28103)
Signed-off-by: zhuhaoran <zhuhaoran.zhr@alibaba-inc.com>
Signed-off-by: izhuhaoran <izhuhaoran@qq.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
|
2025-11-20 19:39:09 -08:00 |
|
Driss Guessous
|
3fd74189db
|
Fixes bench (#29058)
Signed-off-by: drisspg <drisspguessous@gmail.com>
|
2025-11-20 21:21:54 +00:00 |
|
vnadathur
|
1ffe934c8a
|
[torch.compile] caching of config fields should be opt-out by default (#26468)
Signed-off-by: vnadathur <glvikramn@gmail.com>
Signed-off-by: WorldExplored <srreyansh.sethi@gmail.com>
Signed-off-by: Srreyansh Sethi <srreyansh.sethi@gmail.com>
Signed-off-by: Srreyansh Sethi <107075589+WorldExplored@users.noreply.github.com>
Co-authored-by: WorldExplored <srreyansh.sethi@gmail.com>
Co-authored-by: Srreyansh Sethi <107075589+worldexplored@users.noreply.github.com>
Co-authored-by: vnadathur <236933696+vnadathur@users.noreply.github.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
|
2025-11-19 06:13:54 -08:00 |
|
Kunshang Ji
|
1b82fb0ad3
|
[XPU] work around for sp, avoid custom op import error (#28822)
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
|
2025-11-17 13:16:44 +00:00 |
|
Didier Durand
|
2bb4435cb7
|
[Doc]: fix typos in various files (#28567)
Signed-off-by: Didier Durand <durand.didier@gmail.com>
|
2025-11-15 19:27:50 +00:00 |
|
Angela Yi
|
f36292dbee
|
[compile] Enable sequence parallelism matching w/o custom ops enabled (#27126)
Signed-off-by: angelayi <yiangela7@gmail.com>
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Signed-off-by: ProExpertProg <lgovedic@redhat.com>
Co-authored-by: Luka Govedič <lgovedic@redhat.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Luka Govedič <luka.govedic@gmail.com>
|
2025-11-15 11:46:12 +00:00 |
|
Laith Sakka
|
2e0ad629b0
|
Avoid bytecode hook and simplify TorchCompileWrapperWithCustomDipatch (#25110)
Signed-off-by: Laith Sakka <lsakka@meta.com>
|
2025-11-14 14:11:10 -08:00 |
|
Yanan Cao
|
262d263f6c
|
[Bugfix] Eliminate tuple inputs to submodules in graph partitioning (#28533)
Signed-off-by: Yanan Cao <gmagogsfm@gmail.com>
|
2025-11-13 15:09:05 -05:00 |
|
Yanan Cao
|
48c879369f
|
[Frontend] Change CompilationMode to a proper Enum (#28165)
Signed-off-by: Yanan Cao <gmagogsfm@gmail.com>
|
2025-11-11 19:46:18 -05:00 |
|
zhrrr
|
68c09efc37
|
[Kernel][Perf] fuse QK Norm and RoPE into one cuda kernel for Qwen Model (#27165)
Signed-off-by: zhuhaoran <zhuhaoran.zhr@alibaba-inc.com>
|
2025-11-11 12:00:31 -05:00 |
|
Ilya Markov
|
d17ecc6b19
|
[PERF] Allreduce fusion. Support torch native matching. Tuning of the thresholds (#24248)
Signed-off-by: Luka Govedič <lgovedic@redhat.com>
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Signed-off-by: ilmarkov <markovilya197@gmail.com>
Co-authored-by: Luka Govedič <lgovedic@redhat.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
|
2025-11-10 18:33:11 -05:00 |
|
Boyuan Feng
|
b158df2813
|
remove resolve_op_overloads and use splitting_ops directly (#28081)
Signed-off-by: Boyuan Feng <boyuan@meta.com>
|
2025-11-08 01:13:13 +00:00 |
|
gmagogsfm
|
002b07c4b2
|
[Bugfix] vLLM should check Inductor config for compile cache enablement status (#27637)
Signed-off-by: Yanan Cao <gmagogsfm@gmail.com>
|
2025-11-05 12:22:44 -05:00 |
|
Boyuan Feng
|
6ab183813c
|
[Graph Partition][Cache] Use inductor partition ops config (#27702)
Signed-off-by: Boyuan Feng <boyuan@meta.com>
|
2025-11-05 13:04:48 +00:00 |
|
ahao-anyscale
|
cac4c10ef0
|
[BUG] Make 'binary' default option for saving torch compile artifacts when using standalone_compile (#27616)
Signed-off-by: ahao-anyscale <ahao@anyscale.com>
|
2025-11-03 11:13:51 -05:00 |
|
Lucas Kabela
|
94666612a9
|
[Misc][qwen2_5_vl][torch.compile] Enable supports_torch_compile on generic nn.Module and demonstrate speedup on Qwen Vision model (#23207)
Signed-off-by: Lucas Kabela <lucaskabela@meta.com>
Signed-off-by: Lucas Kabela <lucasakabela@gmail.com>
|
2025-10-28 22:36:43 +00:00 |
|
Zhengxu Chen
|
e3d8186666
|
[compile] Add fallback path to AOT compile when serialization fails. (#27350)
Signed-off-by: zhxchen17 <zhxchen17@fb.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2025-10-28 12:54:26 -04:00 |
|
Zhengxu Chen
|
a00d6254e9
|
[compile] Disable dynamo guards check for AOT compilation. (#27288)
Signed-off-by: zhxchen17 <zhxchen17@fb.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2025-10-28 12:58:12 +00:00 |
|
Yeshwanth N
|
71b1c8b667
|
[Chore]:Extract math and argparse utilities to separate modules (#27188)
Signed-off-by: Yeshwanth Surya <yeshsurya@gmail.com>
Signed-off-by: Yeshwanth N <yeshsurya@gmail.com>
Signed-off-by: yeshsurya <yeshsurya@gmail.com>
|
2025-10-26 04:03:32 -07:00 |
|
Wentao Ye
|
52efc34ebf
|
[Log] Optimize Startup Log (#26740)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-24 19:27:04 -04:00 |
|
dongbo910220
|
a0003b56b0
|
[Chore] Separate out system utilities from vllm.utils (#27201)
Signed-off-by: dongbo910220 <1275604947@qq.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2025-10-22 20:25:25 +00:00 |
|
Jiangyun Zhu
|
ab3e80042e
|
[torch.compile] Enable silu_mul_fp8_quant fusion without custom ops enabled (#27146)
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
|
2025-10-22 00:22:39 -04:00 |
|
Isotr0py
|
6ac5e06f7c
|
[Chore] Clean up pytorch helper functions in vllm.utils (#26908)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: isotr0py <2037008807@qq.com>
|
2025-10-18 09:48:22 -07:00 |
|
Luka Govedič
|
bd7157a071
|
[torch.compile] Enable attention and allreduce fusion without custom ops enabled (#24604)
Signed-off-by: Luka Govedič <lgovedic@redhat.com>
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
|
2025-10-17 08:10:23 -06:00 |
|
Harry Mellor
|
6c9fdbf725
|
[Docs] Replace rst style double-backtick with md single-backtick (#27091)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-10-17 02:47:34 -07:00 |
|
Cyrus Leung
|
4d4d6bad19
|
[Chore] Separate out vllm.utils.importlib (#27022)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-10-17 00:48:59 +00:00 |
|
Lucia Fang
|
11ae016bd7
|
[torch.compile] Passing only necessary compilation config to inductor pass config (#27041)
Signed-off-by: Lu Fang <fanglu@fb.com>
Co-authored-by: Lucia (Lu) Fang <fanglu@meta.com>
|
2025-10-17 00:01:52 +00:00 |
|
Bram Wasti
|
7d8975de84
|
Deepseek-v3 Batch Invariant on 8xH100 (#26609)
Signed-off-by: Bram Wasti <bwasti@meta.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
|
2025-10-15 22:06:02 -07:00 |
|
Richard Zou
|
9b6504c307
|
[BugFix] Work around graph partition x torch.compile cache issue (#26956)
Signed-off-by: Richard Zou <zou3519@gmail.com>
|
2025-10-15 20:06:11 -07:00 |
|
Morrison Turnansky
|
96b9aa5aa0
|
[Frontend][torch.compile] CompilationConfig Overhaul (#20283): name change compilation level to compilation mode, deprecation compilation level (#26355)
Signed-off-by: morrison-turnansky <mturnans@redhat.com>
Signed-off-by: Morrison Turnansky <mturnans@redhat.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
|
2025-10-15 02:51:16 +00:00 |
|
Luka Govedič
|
2dcd12d357
|
[torch.compile] Fix tests for torch==2.9 inductor partition (#26116)
Signed-off-by: ProExpertProg <lgovedic@redhat.com>
Signed-off-by: Luka Govedič <lgovedic@redhat.com>
|
2025-10-14 19:55:02 -04:00 |
|
Angela Yi
|
b59dd19b55
|
[compile] Enable sequence parallelism for full cuda graph without specifying compile sizes (#26681)
Signed-off-by: angelayi <yiangela7@gmail.com>
|
2025-10-13 18:15:34 -07:00 |
|