Ikenna
|
906077181b
|
[Bugfix] Fix QK Norm+RoPE fusion pattern matching on B200+FP8 (#33967)
Signed-off-by: Ikenna <ikennachifo@gmail.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
|
2026-02-07 02:27:33 +00:00 |
|
Wentao Ye
|
67a746e87f
|
[Log] Optimize duplicate startup log (#33944)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2026-02-06 17:49:56 +00:00 |
|
Luka Govedič
|
ac32e66cf9
|
[torch.compile] Reorganize vllm/compilation and tests/compile (0/N for vLLM IR) (#33731)
Signed-off-by: Luka Govedič <lgovedic@redhat.com>
Signed-off-by: ProExpertProg <luka.govedic@gmail.com>
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
|
2026-02-06 04:19:49 -08:00 |
|
emricksini-h
|
325ab6b0a8
|
[Feature] OTEL tracing during loading (#31162)
|
2026-02-05 16:59:28 -08:00 |
|
Richard Zou
|
9f14c9224d
|
Revert "[torch.compile] Significantly speed up cold start times" (#33820)
Signed-off-by: Richard Zou <zou3519@gmail.com>
|
2026-02-04 21:59:59 +00:00 |
|
Zhengxu Chen
|
bcd2f74c0d
|
[compile] Clean up AOT compile bypass on evaluate_guards. (#33578)
Signed-off-by: zhxchen17 <zhxchen17@fb.com>
|
2026-02-04 02:12:53 -08:00 |
|
Wentao Ye
|
5e1e0a0fbd
|
[Refactor] Remove unused dead code (#33718)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2026-02-03 21:25:11 -08:00 |
|
Richard Zou
|
b1bb18de8d
|
[torch.compile] Significantly speed up cold start times (#33641)
Signed-off-by: Richard Zou <zou3519@gmail.com>
|
2026-02-03 09:12:11 -08:00 |
|
Richard Zou
|
fd9c83d0e0
|
[torch.compile] Document the workaround to standalone_compile failing (#33571)
Signed-off-by: Richard Zou <zou3519@gmail.com>
|
2026-02-03 07:16:55 +00:00 |
|
Luka Govedič
|
15f40b20aa
|
[fix][torch.compile] Fix cold-start compilation time increase by adding kv cache update to splitting ops (#33441)
Signed-off-by: Luka Govedič <lgovedic@redhat.com>
Co-authored-by: Richard Zou <zou3519@gmail.com>
|
2026-01-31 06:48:34 -08:00 |
|
Angela Yi
|
608b556507
|
[ez] Add structured torch.compile logs (#33213)
Signed-off-by: angelayi <yiangela7@gmail.com>
|
2026-01-31 21:00:54 +08:00 |
|
Angela Yi
|
5a66c9cc76
|
[ez] Delete torch25_custom_graph_pass (#33287)
Signed-off-by: angelayi <yiangela7@gmail.com>
|
2026-01-29 16:47:05 +00:00 |
|
Angela Yi
|
07ea184f00
|
[ez] Delete more torch version checks <= 2.8 (#33288)
Signed-off-by: angelayi <yiangela7@gmail.com>
|
2026-01-29 05:28:46 +00:00 |
|
Angela Yi
|
4197168ea5
|
[ez] Remove checks for torch version <= 2.8 (#33209)
Signed-off-by: angelayi <yiangela7@gmail.com>
|
2026-01-28 16:03:56 -05:00 |
|
Rohan Potdar
|
59bcc5b6f2
|
Use aiter triton fused_add_rmsnorm_pad for gpt-oss (#30976)
Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>
|
2026-01-28 20:47:47 +00:00 |
|
Harry Mellor
|
f1acbd68c5
|
[CI] Enable mypy import following for vllm/compilation (#33199)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-01-28 08:59:54 +00:00 |
|
Harry Mellor
|
2eb673a088
|
Add flake8-implicit-str-concat rules to Ruff (#33191)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-01-28 04:56:10 +00:00 |
|
Matthew Bonanni
|
a608b4c6c2
|
[5/N][Attention] Finish eliminating vllm/attention folder (#32064)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
|
2026-01-27 10:02:51 -05:00 |
|
Roberto L. Castro
|
fcb9df99bd
|
[Perf][Kernel] Optimize FP4 quantization kernels (SM100F) (#32520)
Signed-off-by: LopezCastroRoberto <rocastro@redhat.com>
|
2026-01-24 18:45:27 -07:00 |
|
Luka Govedič
|
5e4e0e51f4
|
[torch.compile] Compile CustomOp.forward_native for SiluAndMul and QuantFP8 to avoid raw torch ops inside opaque custom ops (#32806)
Signed-off-by: Luka Govedič <lgovedic@redhat.com>
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
|
2026-01-22 19:52:26 -08:00 |
|
Lucas Kabela
|
15e302dfce
|
[Misc][BE] Turn on strict type coverage for vllm/compilation (#31756)
Signed-off-by: Lucas Kabela <lucaskabela@meta.com>
|
2026-01-22 15:12:26 +00:00 |
|
Robert Shaw
|
42135d6898
|
[MoE Refactor] Oracle Select FP8+NVFP4 Kernels In Priority (#32414)
|
2026-01-21 08:22:33 -05:00 |
|
Lucas Wilkinson
|
2261340806
|
[Misc] Remove pad_for_cudagraphs from config (#30143)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Co-authored-by: Matthew Bonanni <mbonanni@redhat.com>
|
2026-01-20 15:05:48 -05:00 |
|
dolpm
|
7c5dedc247
|
[AOT compilation] support torch.compile inductor artifacts in VllmCompiledFunction (#25205)
Signed-off-by: dolpm <34420038+dolpm@users.noreply.github.com>
|
2026-01-20 19:45:59 +00:00 |
|
Nicolò Lucchesi
|
74c583bc50
|
[Core] Whisper support torch.compile (#30385)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2026-01-19 10:02:31 +00:00 |
|
Richard Zou
|
bd292be0c0
|
[BugFix] Python file source reading can fail on UnicodeDecodeError (#32416)
Signed-off-by: Richard Zou <zou3519@gmail.com>
|
2026-01-15 20:01:41 +00:00 |
|
Angela Yi
|
7933638051
|
[misc] Remove is_torch_equal_or_newer(2.4) cases (#32296)
Signed-off-by: angelayi <yiangela7@gmail.com>
|
2026-01-13 23:22:07 -08:00 |
|
cjackal
|
15b33ff064
|
[Misc] improve warning/assert messages (#32226)
Signed-off-by: cjackal <44624812+cjackal@users.noreply.github.com>
|
2026-01-13 03:11:23 +00:00 |
|
Lucas Kabela
|
ad8818bb5e
|
[Misc][BE] Type coverage for vllm/compilation [3/3] (#31748)
Signed-off-by: Lucas Kabela <lucaskabela@meta.com>
|
2026-01-12 19:24:38 +00:00 |
|
Laith Sakka
|
46eb30f519
|
make assume_32_bit_indexing configurable (#32044)
Signed-off-by: Laith Sakka <lsakka@meta.com>
|
2026-01-10 23:15:46 -08:00 |
|
Kevin McKay
|
4dc0d606b7
|
[Bugfix] Narrow broad exceptions in compilation backends (#31616)
Signed-off-by: c0de128 <kevin.mckay@outlook.com>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2026-01-09 21:39:22 -05:00 |
|
Lucas Kabela
|
aaf4b70aae
|
[Misc][BE] Type coverage for vllm/compilation [2/3] (#31744)
|
2026-01-09 18:30:38 -05:00 |
|
DevByteAI
|
1f214290d6
|
fix(compile): apply partition wrapper when loading AOT cached functions (#31536)
Signed-off-by: Devbyteai <abud6673@gmail.com>
Signed-off-by: DevByteAI <161969603+devbyteai@users.noreply.github.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
|
2026-01-08 17:27:26 +08:00 |
|
Lucas Kabela
|
873480d133
|
[Misc][BE] Type coverage for vllm/compilation [1/3] (#31554)
Signed-off-by: Lucas Kabela <lucaskabela@meta.com>
|
2026-01-06 20:37:51 -05:00 |
|
Nick Hill
|
bd877162eb
|
[BugFix] Support online dense model DP without overhead (#30739)
Signed-off-by: Nick Hill <nhill@redhat.com>
Signed-off-by: njhill <nickhill123@gmail.com>
|
2026-01-02 23:36:38 +08:00 |
|
Amir Samani
|
030fc44914
|
use the same stream for cuda graph catpure and replay for NCCL (#29207)
Signed-off-by: Amir Samani <asamani@nvidia.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
|
2025-12-25 19:10:03 +08:00 |
|
vllmellm
|
f32cfd7d97
|
[ROCm][FEAT] Support AITER RMSNorm quantization fusion pass (#26575)
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
Co-authored-by: TJian <tunjian.tan@embeddedllm.com>
|
2025-12-23 02:07:54 -08:00 |
|
Lucas Kabela
|
0db5439ded
|
[Bugfix][torch2.10] Fix test_qwen2_5_vl_compilation with 2.10 RC (#30822)
Signed-off-by: Lucas Kabela <lucaskabela@meta.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2025-12-18 08:23:31 -08:00 |
|
Zhengxu Chen
|
53cd7f868b
|
[compile] Recompile graph module during Dynamo cache loading. (#30743)
Signed-off-by: Zhengxu Chen <zhxchen17@fb.com>
|
2025-12-17 02:00:12 -08:00 |
|
Zhengxu Chen
|
177c391db2
|
[compile] Disable aot when eager backend is used. (#30810)
Signed-off-by: zhxchen17 <zhxchen17@fb.com>
|
2025-12-17 01:55:56 -08:00 |
|
ElizaWszola
|
994acec0cc
|
[Bugfix] Fix fusion for VL models (#30244)
Signed-off-by: ElizaWszola <ewszola@redhat.com>
|
2025-12-14 21:22:37 +08:00 |
|
Ilya Markov
|
3224ea9915
|
[torch.compile] Add encoder tag for compilation (#30489)
Signed-off-by: ilmarkov <markovilya197@gmail.com>
|
2025-12-14 18:15:11 +08:00 |
|
Laith Sakka
|
f569c654e1
|
enable unbacked with aot_compile (#30462)
Signed-off-by: Laith Sakka <lsakka@meta.com>
|
2025-12-14 08:14:06 +00:00 |
|
Laith Sakka
|
763963aa73
|
set assume_32bit_indexing and pass unbacked hints (#30459)
Signed-off-by: Laith Sakka <lsakka@meta.com>
|
2025-12-13 15:36:53 +00:00 |
|
Zhengxu Chen
|
fe1787107e
|
[compile] Parse compile range cache keys as Range during cache loading. (#30516)
Signed-off-by: zhxchen17 <zhxchen17@fb.com>
|
2025-12-12 04:30:51 +00:00 |
|
Zhengxu Chen
|
92fea56fd1
|
[compile] Stop one-off setting enable_aot_compile and use context manager instead. (#30503)
Signed-off-by: zhxchen17 <zhxchen17@fb.com>
|
2025-12-11 20:28:03 +00:00 |
|
Charlie Fu
|
3c680f4a17
|
[Rocm][torch.compile] Adding layernorm + fp8 block quant and silu + fp8 block quant for Aiter (#25693)
Signed-off-by: charlifu <charlifu@amd.com>
Signed-off-by: Micah Williamson <micah.williamson@amd.com>
Signed-off-by: Charlie Fu <Charlie.Fu@amd.com>
Co-authored-by: Micah Williamson <micah.williamson@amd.com>
Co-authored-by: wuhuikx <hattie.wu@amd.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Gregory Shtrasberg <156009573+gshtras@users.noreply.github.com>
|
2025-12-09 22:39:26 +00:00 |
|
Ilya Markov
|
0b6a8a304c
|
[BugFix] Fix non detected failing tests (#30277)
Signed-off-by: ilmarkov <markovilya197@gmail.com>
|
2025-12-09 17:57:55 +00:00 |
|
Laith Sakka
|
87aee9ed2b
|
Add evaluate_guards option to DynamicShapesConfig (#27432)
Signed-off-by: Laith Sakka <lsakka@meta.com>
|
2025-12-08 10:46:15 -05:00 |
|
Ye (Charlotte) Qi
|
eb1051fb95
|
[ROCm] Guard group quant RMS norm fusion patterns (#30239)
|
2025-12-08 14:44:48 +00:00 |
|