Icey
|
888152bf87
|
Allow oot custom compiler extension via CompilerInterface (#28623)
Signed-off-by: wxsIcey <1790571317@qq.com>
Signed-off-by: Mengqing Cao <cmq0113@163.com>
Signed-off-by: Icey <1790571317@qq.com>
Co-authored-by: Mengqing Cao <cmq0113@163.com>
|
2025-11-25 15:25:15 +08:00 |
|
Laith Sakka
|
7a228b5305
|
Add option to use unbacked, and backed size obl dynamic shapes for more sounds compilation. (#26199)
Signed-off-by: Laith Sakka <lsakka@meta.com>
|
2025-11-24 10:12:41 -05:00 |
|
Lucas Wilkinson
|
30d6466238
|
[BugFix] Fix Eagle IndexError: list index out of range for even num_speculative_tokens (#29102)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2025-11-22 00:47:05 +00:00 |
|
Boyuan Feng
|
8c25f9cfb6
|
[BugFix] skip combo kernel on cpu (#29129)
Signed-off-by: Boyuan Feng <boyuan@meta.com>
|
2025-11-21 11:50:59 +08:00 |
|
Lucas Wilkinson
|
8f4f77a727
|
[BugFix] Fix false assertion with spec-decode=[2,4,..] and TP>2 (#29036)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2025-11-19 13:43:54 -08:00 |
|
vnadathur
|
1ffe934c8a
|
[torch.compile] caching of config fields should be opt-out by default (#26468)
Signed-off-by: vnadathur <glvikramn@gmail.com>
Signed-off-by: WorldExplored <srreyansh.sethi@gmail.com>
Signed-off-by: Srreyansh Sethi <srreyansh.sethi@gmail.com>
Signed-off-by: Srreyansh Sethi <107075589+WorldExplored@users.noreply.github.com>
Co-authored-by: WorldExplored <srreyansh.sethi@gmail.com>
Co-authored-by: Srreyansh Sethi <107075589+worldexplored@users.noreply.github.com>
Co-authored-by: vnadathur <236933696+vnadathur@users.noreply.github.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
|
2025-11-19 06:13:54 -08:00 |
|
Lucas Wilkinson
|
64e39d667c
|
[BugFix] Temporary fix for IMA with MTP = 2 and full-cg (#28315)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2025-11-17 09:41:22 -05:00 |
|
Roger Wang
|
d3387750f1
|
[Misc] Turn off encoder torch compile by default (#28634)
Signed-off-by: Roger Wang <hey@rogerw.io>
|
2025-11-13 08:38:08 -08:00 |
|
Harry Mellor
|
a742134cc5
|
Remove deprecated fields from CompilationConfig (#27593)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-11-12 16:10:28 +00:00 |
|
TJian
|
edb59a9470
|
[ROCm] [Bugfix] Fix fused_qknorm_rope_kernel rocm compatibility (#28500)
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
|
2025-11-12 05:01:14 -08:00 |
|
Yanan Cao
|
48c879369f
|
[Frontend] Change CompilationMode to a proper Enum (#28165)
Signed-off-by: Yanan Cao <gmagogsfm@gmail.com>
|
2025-11-11 19:46:18 -05:00 |
|
zhrrr
|
68c09efc37
|
[Kernel][Perf] fuse QK Norm and RoPE into one cuda kernel for Qwen Model (#27165)
Signed-off-by: zhuhaoran <zhuhaoran.zhr@alibaba-inc.com>
|
2025-11-11 12:00:31 -05:00 |
|
Ilya Markov
|
d17ecc6b19
|
[PERF] Allreduce fusion. Support torch native matching. Tuning of the thresholds (#24248)
Signed-off-by: Luka Govedič <lgovedic@redhat.com>
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Signed-off-by: ilmarkov <markovilya197@gmail.com>
Co-authored-by: Luka Govedič <lgovedic@redhat.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
|
2025-11-10 18:33:11 -05:00 |
|
Harry Mellor
|
c0a4b95d64
|
Fix issues from #28242 (#28257)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-11-07 04:23:17 +00:00 |
|
Lucas Kabela
|
4bf56c79cc
|
[Multimodal][torch.compile] Add compilation config field for turning off ViT/MM compile (#28242)
Signed-off-by: Lucas Kabela <lucaskabela@meta.com>
|
2025-11-07 00:16:03 +00:00 |
|
Vadim Gimpelson
|
b6a248bdd7
|
[PERF] Decouple projections from GDN custom op. Attempt 2 (#28083)
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>
|
2025-11-05 17:01:12 -08:00 |
|
Vadim Gimpelson
|
d4e547bb7e
|
Revert "[PERF] Decouple projections from GDN custom op" (#28080)
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>
|
2025-11-04 15:58:23 -08:00 |
|
Vadim Gimpelson
|
5fd8f02ea9
|
[PERF] Decouple projections from GDN custom op (#27512)
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>
|
2025-11-04 08:11:41 -08:00 |
|
ahao-anyscale
|
cac4c10ef0
|
[BUG] Make 'binary' default option for saving torch compile artifacts when using standalone_compile (#27616)
Signed-off-by: ahao-anyscale <ahao@anyscale.com>
|
2025-11-03 11:13:51 -05:00 |
|
Zhiyuan Li
|
4e68cc9b6a
|
[Model] Introduce Kimi Linear to vLLM (#27809)
Signed-off-by: lizhiyuan <lizhiyuan@moonshot.cn>
Signed-off-by: Zhiyuan Li <uniartisan2017@gmail.com>
|
2025-10-30 21:02:27 +08:00 |
|
Lucas Kabela
|
94666612a9
|
[Misc][qwen2_5_vl][torch.compile] Enable supports_torch_compile on generic nn.Module and demonstrate speedup on Qwen Vision model (#23207)
Signed-off-by: Lucas Kabela <lucaskabela@meta.com>
Signed-off-by: Lucas Kabela <lucasakabela@gmail.com>
|
2025-10-28 22:36:43 +00:00 |
|
fhl2000
|
284cc92275
|
[MISC] cudagraph_capture_sizes related improvements (#26016)
Signed-off-by: fhl <2410591650@qq.com>
Signed-off-by: fhl2000 <63384265+fhl2000@users.noreply.github.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-10-24 05:11:05 -07:00 |
|
Andy Lo
|
b63f2143f8
|
[LoRA] LoRA cuda graph specialization (#25914)
Signed-off-by: Andy Lo <andy@mistral.ai>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-10-20 04:21:09 +00:00 |
|
Isotr0py
|
6ac5e06f7c
|
[Chore] Clean up pytorch helper functions in vllm.utils (#26908)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: isotr0py <2037008807@qq.com>
|
2025-10-18 09:48:22 -07:00 |
|
Cyrus Leung
|
4d4d6bad19
|
[Chore] Separate out vllm.utils.importlib (#27022)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-10-17 00:48:59 +00:00 |
|
Morrison Turnansky
|
96b9aa5aa0
|
[Frontend][torch.compile] CompilationConfig Overhaul (#20283): name change compilation level to compilation mode, deprecation compilation level (#26355)
Signed-off-by: morrison-turnansky <mturnans@redhat.com>
Signed-off-by: Morrison Turnansky <mturnans@redhat.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
|
2025-10-15 02:51:16 +00:00 |
|
Luka Govedič
|
2dcd12d357
|
[torch.compile] Fix tests for torch==2.9 inductor partition (#26116)
Signed-off-by: ProExpertProg <lgovedic@redhat.com>
Signed-off-by: Luka Govedič <lgovedic@redhat.com>
|
2025-10-14 19:55:02 -04:00 |
|
Boyuan Feng
|
ca683a2a72
|
use combo kernel to fuse qk-norm and qk-rope (#26682)
Signed-off-by: Boyuan Feng <boyuan@meta.com>
|
2025-10-14 09:40:59 -04:00 |
|
Morrison Turnansky
|
e3fdb627d9
|
[FrontEnd] UNREVERT CompilationConfig overhaul (#20283): deprecate use_inductor in favor of backend, simplify custom_ops (#26502)
Signed-off-by: morrison-turnansky <mturnans@redhat.com>
Signed-off-by: Morrison Turnansky <mturnans@redhat.com>
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Jiangyun Zhu <riverclouds.zhu@qq.com>
|
2025-10-13 22:47:16 +00:00 |
|
Harry Mellor
|
8fcaaf6a16
|
Update Optional[x] -> x | None and Union[x, y] to x | y (#26633)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-10-12 09:51:31 -07:00 |
|
Angela Yi
|
01653a917b
|
[compile] Fix inductor partition config (#26645)
Signed-off-by: angelayi <yiangela7@gmail.com>
|
2025-10-11 21:03:14 +00:00 |
|
baonudesifeizhai
|
cddce79fda
|
[torch.compile] Make inductor partition rules respect splitting_ops #25691 (#25845)
Signed-off-by: baonudesifeizhai <baonudesifeizhai@gmail.com>
Signed-off-by: baonudesifeizhai <85092850+baonudesifeizhai@users.noreply.github.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
|
2025-10-10 16:35:28 +00:00 |
|
Lucas Wilkinson
|
29255cfc3b
|
[Spec-Decode] Support piecewise cudagraphs for Eagle head (#25109)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
Co-authored-by: Benjamin Chislett <chislett.ben@gmail.com>
|
2025-10-10 01:20:31 -04:00 |
|
Jiangyun Zhu
|
5728da11ea
|
Revert #26113 "[Frontend] CompilationConfig overhaul (#20283): deprecate use_inductor in favor of backend, simplify custom_ops" (#26472)
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
|
2025-10-09 05:43:55 -07:00 |
|
Naveenraj Kamalakannan
|
e614ab7806
|
Separate MLAAttention class from Attention (#25103)
Signed-off-by: Naveenraj Kamalakannan <therealnaveenkamal@gmail.com>
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
|
2025-10-08 17:11:11 -07:00 |
|
Morrison Turnansky
|
0c824fc46f
|
[Frontend] CompilationConfig overhaul (#20283): deprecate use_inductor in favor of backend, simplify custom_ops (#26113)
Signed-off-by: morrison-turnansky <mturnans@redhat.com>
Signed-off-by: Morrison Turnansky <mturnans@redhat.com>
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Jiangyun Zhu <riverclouds.zhu@qq.com>
|
2025-10-07 12:53:43 -07:00 |
|
Harry Mellor
|
d6953beb91
|
Convert formatting to use ruff instead of yapf + isort (#26247)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-10-05 07:06:22 -07:00 |
|
Yongye Zhu
|
fa7e254a7f
|
[New Model] DeepSeek-V3.2 (Rebased to Main) (#25896)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: NickLucche <nlucches@redhat.com>
Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>
Signed-off-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com>
Signed-off-by: Lucia Fang <fanglu@meta.com>
Co-authored-by: Chen Zhang <zhangch99@outlook.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: Lucas Wilkinson <lwilkins@redhat.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
Co-authored-by: yewentao256 <zhyanwentao@126.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
Co-authored-by: Lucia Fang <116399278+luccafong@users.noreply.github.com>
Co-authored-by: Lucia Fang <fanglu@meta.com>
Co-authored-by: NickLucche <nlucches@redhat.com>
Co-authored-by: Siyuan Fu <siyuanf@nvidia.com>
Co-authored-by: Matthew Bonanni <mbonanni@redhat.com>
Co-authored-by: Xiaozhu Meng <mxz297@gmail.com>
Co-authored-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com>
|
2025-09-30 17:14:41 +08:00 |
|
Jiangyun Zhu
|
43227236ec
|
[torch.compile] serialize cudagraph_mode as its enum name instead of value (#25868)
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
|
2025-09-29 13:54:52 +00:00 |
|
Jiangyun Zhu
|
c0ec81836f
|
[torch.compile]: Add VLLM_DEBUG_DUMP_PATH environment variable (#25651)
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
Signed-off-by: Jiangyun Zhu <riverclouds.zhu@qq.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
|
2025-09-27 16:09:00 +00:00 |
|
fhl2000
|
f075693da7
|
[V1] address post issues related to #20059 (part 1) (#23046)
Signed-off-by: fhl2000 <63384265+fhl2000@users.noreply.github.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
|
2025-09-26 15:58:19 -04:00 |
|
Isotr0py
|
71b25b0d48
|
[V0 deprecation] Clean up V0 fallback in compilation config (#25675)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-09-25 17:29:51 +00:00 |
|
Michael Goin
|
24fab45d96
|
[Perf] Change default CUDAGraphMode from PIECEWISE to FULL_AND_PIECEWISE (#25444)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-09-23 15:29:26 -04:00 |
|
Luka Govedič
|
d5e0fca264
|
[torch.compile] Cleanup compilation tests and custom passes, add debug utils, fix DCE bug (#23091), fix test (#24376), and prep for custom op matching (#24604) (#24542)
Signed-off-by: Luka Govedič <lgovedic@redhat.com>
Signed-off-by: luka <lgovedic@redhat.com>
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
|
2025-09-22 12:30:05 -07:00 |
|
Boyuan Feng
|
8945b001db
|
[torch.compile] CUDAGraph Inductor partition integration (#24281)
Signed-off-by: Boyuan Feng <boyuan@meta.com>
Signed-off-by: Boyuan Feng <fby.1994@gmail.com>
Signed-off-by: boyuanfeng <boyuan@meta.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
|
2025-09-20 01:02:15 +00:00 |
|
Wentao Ye
|
d2a30a2d93
|
[Bug] Fix torch Compilation Cache Hit Error (#25093)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-09-18 12:38:37 -07:00 |
|
Tao He
|
e93f4cc9e3
|
Add the support for the qwen3 next model (a hybrid attention model). (#24526)
Signed-off-by: Tao He <linzhu.ht@alibaba-inc.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-09-11 15:32:09 +08:00 |
|
Wentao Ye
|
a55cf41a09
|
[Compilation][WideEP] Enable Piecewise CUDAGraph for DeepEPHT (#24123)
|
2025-09-09 10:21:10 -04:00 |
|
nopperl
|
fa4311d85f
|
[V1] v1 engine + full CUDA graph support for PLaMo2 (#23998)
Signed-off-by: Hemmi Shinichi <shemmi@preferred.jp>
Signed-off-by: nopperl <54780682+nopperl@users.noreply.github.com>
Co-authored-by: Hemmi Shinichi <shemmi@preferred.jp>
Co-authored-by: Thomas Parnell <tom.parnell@gmail.com>
|
2025-09-03 08:24:02 -07:00 |
|
co63oc
|
1bd007f234
|
fix some typos (#24071)
Signed-off-by: co63oc <co63oc@users.noreply.github.com>
|
2025-09-02 20:44:50 -07:00 |
|