Commit Graph

238 Commits

Author SHA1 Message Date
Luka Govedič
15f40b20aa [fix][torch.compile] Fix cold-start compilation time increase by adding kv cache update to splitting ops (#33441)
Signed-off-by: Luka Govedič <lgovedic@redhat.com>
Co-authored-by: Richard Zou <zou3519@gmail.com>
2026-01-31 06:48:34 -08:00
Angela Yi
608b556507 [ez] Add structured torch.compile logs (#33213)
Signed-off-by: angelayi <yiangela7@gmail.com>
2026-01-31 21:00:54 +08:00
Rohan Potdar
59bcc5b6f2 Use aiter triton fused_add_rmsnorm_pad for gpt-oss (#30976)
Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>
2026-01-28 20:47:47 +00:00
Robert Shaw
af9b69f977 [Quantization][Deprecation] Remove Marlin 24 (#32688)
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2026-01-28 15:54:59 +00:00
Matthew Bonanni
a608b4c6c2 [5/N][Attention] Finish eliminating vllm/attention folder (#32064)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
2026-01-27 10:02:51 -05:00
Luka Govedič
bbbd696af9 [torch.compile][CI] Add back attn fusion on hopper/ada (#32940)
Signed-off-by: Luka Govedič <lgovedic@redhat.com>
2026-01-23 16:49:20 +00:00
Xin Yang
90c2007932 [Bugfix] Disable tma_aligned_scales in test_fusions_e2e (#32916)
Signed-off-by: Xin Yang <xyangx@amazon.com>
2026-01-23 14:34:30 +00:00
Luka Govedič
5e4e0e51f4 [torch.compile] Compile CustomOp.forward_native for SiluAndMul and QuantFP8 to avoid raw torch ops inside opaque custom ops (#32806)
Signed-off-by: Luka Govedič <lgovedic@redhat.com>
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
2026-01-22 19:52:26 -08:00
Lucas Kabela
15e302dfce [Misc][BE] Turn on strict type coverage for vllm/compilation (#31756)
Signed-off-by: Lucas Kabela <lucaskabela@meta.com>
2026-01-22 15:12:26 +00:00
Robert Shaw
42135d6898 [MoE Refactor] Oracle Select FP8+NVFP4 Kernels In Priority (#32414) 2026-01-21 08:22:33 -05:00
Lucas Wilkinson
2261340806 [Misc] Remove pad_for_cudagraphs from config (#30143)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Co-authored-by: Matthew Bonanni <mbonanni@redhat.com>
2026-01-20 15:05:48 -05:00
dolpm
7c5dedc247 [AOT compilation] support torch.compile inductor artifacts in VllmCompiledFunction (#25205)
Signed-off-by: dolpm <34420038+dolpm@users.noreply.github.com>
2026-01-20 19:45:59 +00:00
vllmellm
148117ea2e [Refactor] Make FP8 Linear Ops use kernel abstraction (#27814)
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
2026-01-20 14:48:20 +08:00
Lucas Wilkinson
14ce524249 [CI] Breakup h200 tests (#30499)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2026-01-16 06:23:22 +00:00
dolpm
8471b27df9 [compile] raise on compile_size implicit padding (#32343)
Signed-off-by: dolpm <34420038+dolpm@users.noreply.github.com>
2026-01-14 20:46:56 +00:00
Lucas Kabela
ea6d067a2a [Misc][LLaMa4] Compile LLaMa Vision Encoder (#30709)
Signed-off-by: Lucas Kabela <lucaskabela@meta.com>
2026-01-09 22:01:38 -05:00
Matthew Bonanni
2612ba9285 [1/N][Attention] Restructure attention: move files (#31916)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
2026-01-09 13:10:24 -08:00
Robert Shaw
5825bbc1f7 [Quantization] Deprecate Long Tail of Schemes (#31688)
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Signed-off-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
2026-01-08 19:07:45 -05:00
Lucas Wilkinson
6cdf015c3c [Misc] Fix Current vLLM config is not set. warnings, assert to avoid issues in the future (#31747)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
2026-01-08 15:20:49 -08:00
DevByteAI
1f214290d6 fix(compile): apply partition wrapper when loading AOT cached functions (#31536)
Signed-off-by: Devbyteai <abud6673@gmail.com>
Signed-off-by: DevByteAI <161969603+devbyteai@users.noreply.github.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
2026-01-08 17:27:26 +08:00
Angela Yi
9a1d20a89c [CI] Add warmup run in test_fusion_attn (#31183)
Signed-off-by: angelayi <yiangela7@gmail.com>
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
2026-01-07 00:31:52 +00:00
Charlie Fu
c07163663d [ROCm][CI] Fix tests/compile unit tests (#28895)
Signed-off-by: charlifu <charlifu@amd.com>
Signed-off-by: Micah Williamson <micah.williamson@amd.com>
Signed-off-by: Charlie Fu <Charlie.Fu@amd.com>
Co-authored-by: Micah Williamson <micah.williamson@amd.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
2026-01-06 18:50:43 +00:00
wangxiyuan
bb4337b34c [Platform] Deprecate seed_everything (#31659)
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2026-01-04 18:34:04 -08:00
Boyuan Feng
2f12cd32c0 [BugFix] Fix cache issue in compilation_config (#31376)
Signed-off-by: Boyuan Feng <boyuan@meta.com>
2025-12-27 09:30:39 -05:00
baonudesifeizhai
8711b21676 Fix/get raw stream patch #30905 (#30912)
Signed-off-by: baonudesifeizhai <baonudesifeizhai@gmail.com>
Signed-off-by: baonudesifeizhai <85092850+baonudesifeizhai@users.noreply.github.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
2025-12-26 20:08:47 -08:00
vllmellm
f32cfd7d97 [ROCm][FEAT] Support AITER RMSNorm quantization fusion pass (#26575)
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
Co-authored-by: TJian <tunjian.tan@embeddedllm.com>
2025-12-23 02:07:54 -08:00
Cyrus Leung
8cef137689 [Chore] Update more locations to use attention_config.backend (#31153)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-12-22 19:19:50 -08:00
Angela Yi
612d5ffdab [ci] Fix Pytorch compilation test oom in 2.10 (#31194)
Signed-off-by: angelayi <yiangela7@gmail.com>
2025-12-23 01:56:47 +00:00
Lucas Wilkinson
30bb19a760 [BugFix] Partial revert of #29558 (DeepEP HT + PIECEWISE CG support) (#30910)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-12-17 23:50:15 -08:00
Zhengxu Chen
5f2f3fba1d [compile] Fix CI for test_gpt2_cache_hit (#30902)
Signed-off-by: zhxchen17 <zhxchen17@fb.com>
2025-12-17 20:22:23 -08:00
Matthew Bonanni
7eb6cb6c18 [Attention] Update tests to remove deprecated env vars (#30563)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
2025-12-17 09:49:59 -08:00
Cyrus Leung
44d3b1df3d [CI/Build] Fix compatibility between #30244 and #30396 (#30787)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-12-16 20:21:19 -08:00
Wentao Ye
b6ec077e05 [CI] Skip ci failure test (#30804)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-12-16 22:47:53 +00:00
ElizaWszola
994acec0cc [Bugfix] Fix fusion for VL models (#30244)
Signed-off-by: ElizaWszola <ewszola@redhat.com>
2025-12-14 21:22:37 +08:00
Laith Sakka
f569c654e1 enable unbacked with aot_compile (#30462)
Signed-off-by: Laith Sakka <lsakka@meta.com>
2025-12-14 08:14:06 +00:00
Roberto L. Castro
4fa7ce46f3 [Feature] Add SM103 (Blackwell Ultra) Support to vLLM (#30484)
Signed-off-by: LopezCastroRoberto <robertol.c510@gmail.com>
Signed-off-by: Roberto L. Castro <38211239+LopezCastroRoberto@users.noreply.github.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
2025-12-12 19:34:23 -08:00
rasmith
48661d275f [CI/Build][AMD] Skip tests in test_fusions_e2e and test_dbo_dp_ep_gsm8k that require non-existing imports for ROCm (#30417)
Signed-off-by: Randall Smith <ransmith@amd.com>
Co-authored-by: Randall Smith <ransmith@amd.com>
2025-12-12 00:24:20 +00:00
Harry Mellor
cf3eacfe58 Standardise get_rope to use rope_parameters["partial_rotary_factor"], not rotary_dim (#30389)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-12-11 20:45:23 +00:00
Cyrus Leung
5a87d8b9b1 [Deprecation] Remove deprecated plugin and compilation fields for v0.13 release (#30396)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-12-10 19:59:35 -08:00
Charlie Fu
3c680f4a17 [Rocm][torch.compile] Adding layernorm + fp8 block quant and silu + fp8 block quant for Aiter (#25693)
Signed-off-by: charlifu <charlifu@amd.com>
Signed-off-by: Micah Williamson <micah.williamson@amd.com>
Signed-off-by: Charlie Fu <Charlie.Fu@amd.com>
Co-authored-by: Micah Williamson <micah.williamson@amd.com>
Co-authored-by: wuhuikx <hattie.wu@amd.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Gregory Shtrasberg <156009573+gshtras@users.noreply.github.com>
2025-12-09 22:39:26 +00:00
Ilya Markov
0b6a8a304c [BugFix] Fix non detected failing tests (#30277)
Signed-off-by: ilmarkov <markovilya197@gmail.com>
2025-12-09 17:57:55 +00:00
Yanan Cao
7b35011ad1 Mark qwen2_5_vl as xfail (#30283)
Signed-off-by: Yanan Cao <gmagogsfm@gmail.com>
2025-12-09 01:14:10 +00:00
Laith Sakka
87aee9ed2b Add evaluate_guards option to DynamicShapesConfig (#27432)
Signed-off-by: Laith Sakka <lsakka@meta.com>
2025-12-08 10:46:15 -05:00
ElizaWszola
af0444bf40 [Performance] Fused blockwise quant RMS norm (#27883)
Signed-off-by: ElizaWszola <ewszola@redhat.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Co-authored-by: yewentao256 <zhyanwentao@126.com>
2025-12-07 16:38:04 +00:00
Cyrus Leung
e83b7e379c Revert "[Renderer] Separate out RendererConfig from ModelConfig (#30145)" (#30199) 2025-12-07 00:00:22 -08:00
Cyrus Leung
27f4c2fd46 [Renderer] Separate out RendererConfig from ModelConfig (#30145)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-12-06 23:15:42 -08:00
Wentao Ye
17eb25e327 [Perf] Enable cuda graph for deepepHT, 5.3% throughput improvement, 4.4% TTFT improvement (#29558)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-12-07 04:44:50 +00:00
Ilya Markov
4e26d3b09e [Compile] Conditional compilation. Introduce compile_ranges (#24252)
Signed-off-by: Luka Govedič <lgovedic@redhat.com>
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Signed-off-by: ilmarkov <markovilya197@gmail.com>
Signed-off-by: Luka Govedič <luka.govedic@gmail.com>
Signed-off-by: ProExpertProg <lgovedic@redhat.com>
Co-authored-by: Luka Govedič <lgovedic@redhat.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
Co-authored-by: Luka Govedič <luka.govedic@gmail.com>
2025-12-05 18:17:32 +00:00
Matthew Bonanni
66e674cdd5 [Attention][UX][1/N] Add AttentionConfig and change attention env vars to CLI arguments (#26315)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
2025-12-05 09:48:43 -08:00
Laith Sakka
1f0d184590 [aot_compile]change VLLM backend to read fake args from example_value (#29104)
Signed-off-by: Laith Sakka <lsakka@meta.com>
2025-12-04 17:33:45 -05:00